Back to top
2 minute read

Using Amazon EFS to Persist and Share Data Between Containers in Kubernetes

Persistent and shared data need not be an issue when migrating to docker containers in AWS

Amazon recently announced that EFS (Elastic File System) had gone GA in three regions, including eu-west-1 which was great news for our customers. Like most companies we have our fair share of legacy applications to support that require either stateful data or have a requirement for a shared file system in order to scale a front end with shared assets.

EFS is effectively a managed NFS (network file system) service, NFS is widely used and tested over many many years. EFS provides NFS 4.1 for those wanting more details you can read all about it here.

One such application that works better with a shared file system is WordPress which several of our customers use extensively. Whilst its fairly trivial to offload assets to S3, the WordPress interface allows admins to install or upgrade new themes / plugins and even update the core. The issue with this is that if you are running multiple containers for scale reasons these features work but only upgrade the code base in one container, meaning issues across the site as the containers don't all contain the correct or same code. This can lead to a very bad experience for your users if you miss a update in a container. So the solution for this is to use a shared file system for code. Update one container and all the containers instantly get the new code.

EFS is going to provide us with a fantastically simple way to set up and manage a NFS service, and Kubernetes allows us to take advantage of this by using the Persistent Volumes service and the Persistent Volume Claims, the later allows containers to use Kubernetes to connect through to the Persistent Volume.

Lets get Started

Setting up EFS is a simple click through wizard in the AWS console. Exactly how you set this up is going to depend on your deployment of your systems that run Kubernetes. I'm going to assume you have at least three nodes/minions spread out across 1a, 1b and 1c.

You'll need to provide EFS with a name and tell it the subnets to attach to in your VPC. In my case this is the three private subnets in my VPC. The other thing to note is that EFS needs a security group. For simplicity I added the EFS service to the security group for my Kubernetes servers which allows anything in that group to talk to any other server in that group. These are all easily configured in the wizard but can be changed after creation pretty easily too.

Once set up you'll get the option to see the DNS for the endpoints. If you've set this up in three subnets like me, you'll get three DNS entries for each AZ. Make a note of these you'll only need one but just make a note in case you want to get extra fancy.

Its very easy to set this up which means the tricky bit must be in Kubernetes right?

Configuring Kubernetes

This isn't actually that hard once you understand a couple of concepts. Firstly you're going to need a Persistent Volume, a PV. This is basically Kubernetes interface to a disk, in this case to an NFS mount. I've included a config below which sets up, I named the file persitentvol-1b.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: MyPersistantEFSVol
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: eu-west-1b.fs-75HJfj6j.efs.eu-west-1.amazonaws.com
    path: "/"

Lets have a look at this file. The two key things are capacity:storage and nfs:server these allow us set size limits and connect to the correct mount. Note that EFS will scale on demand and you pay for what you use space wise, so the storage directive allows you to control the limits on this (although I've not tested this is a hard limit yet). The server directive is simply one of the DNS entries you saved earlier whilst setting up EFS. You can see I'm connecting to an endpoint in eu-west-1b hence the file name.

In order to connect this up its simply a case of running the following command:

kubectl create -f persistentvol-1b.yaml

Claim Your Volume

Once Kubernetes knows about this Persistent Volume we need to allow the containers/pods too use this space. This is why we create a Claim, a PVC. The Claim is what we are going to call later on when we create a pod. Thanks to @imac for this tip. The file below persistentvol-claim.yaml show how to set this up:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: MyPersistantEFSVol
  namespace: mynamespace
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

One thing to note in this file is that I've used a namespace as my container is going to live in that namespace also. PV's don't need a namespace but PVC's do if you are starting your container from somewhere other than default. To start this yet again its a simple command:

kubectl create -f persistentvol-clain.yaml

Using the Volume

For this demo I'm going to set up a replication controller, initially with just one pod/container. The reason for this is I'll show you how to scale later and test that your data is shared. The yaml below will set up a container for you with no initial code. I've called the file www-app.yaml:

apiVersion: v1
kind: ReplicationController
metadata:
  namespace: mynamespace
  name: www-app
  labels:
    www-component: app
spec:
  replicas: 1
  selector:
    www-component: app
  template:
    metadata:
      labels:
        www-component: app
    spec:
      containers:
      - name: www-app
        image: richarvey/nginx-php-fpm:latest 
        imagePullPolicy: Always
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: "/var/www/html"
          name: webroot
      volumes:
      - name: webroot
        persistentVolumeClaim:
          claimName: MyPersistantEFSVol

To start the Replication Controller and check its running these commands will help you:

kubectl create -f www-app.yaml
kubectl get pods,rc --namespace mynamespace

Add Some Data

Now lets connect into your pod and add some data to /var/www/html (which is the default webroot for this container)

kubectl exec -it NAME_OF_YOUR_CONTAINER bash --namespace mynamespace

Once you've connected into your container create some files on /var/www/html using you favorite editor (vi). I recommend creating a index.php file. A simple test would be:

<?PHP
phpinfo();
?>

Lets Scale Thing Up

Now we have this file you should be able to see this once you browse to the container in your web browser. You can use kubectl inspect to find your containers IP.

The following command will add more replica's (containers) to your replication controller and these will all share the same Persistent Volume Claim and therefor the same data.

kubectl scale rc/www-app --replicas=3 --namespace mynamespace
kubectl get pods --namespace mynamespace

You'll now see you have three containers. Use kubectl inspect to get the other IP addresses and use your browser to visit them. You'll see that the file you created on the first container is displayed. You'll also notice that editing this file in any of the containers works seamlessly and changes are reflected in all your browser windows.

Further Thoughts

NFS mounts in most Linux distro's allow you to specify a fail over mount endpoint. I'm not sure if this is supported in Kubernetes yet but in theory if/when it is we could use one of the other EFS endpoints to make this highly available. 

If you would like any help with the topics covered in this article please don't hesitate to contact us on info@ngineered.co.uk

Ric Harvey

Ric leads engineering and technical architecture for Ngineered. He has a vast amount of experience in cloud computing, having been responsible for the delivery of large-scale cloud migration projects at companies like Ticketmaster and Channel 4.

Discussion