Using Amazon EFS to Persist and Share Data Between Containers in Kubernetes
Persistent and shared data need not be an issue when migrating to docker containers in AWS
Amazon recently announced that EFS (Elastic File System) had gone GA in three regions, including eu-west-1 which was great news for our customers. Like most companies we have our fair share of legacy applications to support that require either stateful data or have a requirement for a shared file system in order to scale a front end with shared assets.
EFS is effectively a managed NFS (network file system) service, NFS is widely used and tested over many many years. EFS provides NFS 4.1 for those wanting more details you can read all about it here.
One such application that works better with a shared file system is WordPress which several of our customers use extensively. Whilst its fairly trivial to offload assets to S3, the WordPress interface allows admins to install or upgrade new themes / plugins and even update the core. The issue with this is that if you are running multiple containers for scale reasons these features work but only upgrade the code base in one container, meaning issues across the site as the containers don't all contain the correct or same code. This can lead to a very bad experience for your users if you miss a update in a container. So the solution for this is to use a shared file system for code. Update one container and all the containers instantly get the new code.
EFS is going to provide us with a fantastically simple way to set up and manage a NFS service, and Kubernetes allows us to take advantage of this by using the Persistent Volumes service and the Persistent Volume Claims, the later allows containers to use Kubernetes to connect through to the Persistent Volume.
Lets get Started
Setting up EFS is a simple click through wizard in the AWS console. Exactly how you set this up is going to depend on your deployment of your systems that run Kubernetes. I'm going to assume you have at least three nodes/minions spread out across 1a, 1b and 1c.
You'll need to provide EFS with a name and tell it the subnets to attach to in your VPC. In my case this is the three private subnets in my VPC. The other thing to note is that EFS needs a security group. For simplicity I added the EFS service to the security group for my Kubernetes servers which allows anything in that group to talk to any other server in that group. These are all easily configured in the wizard but can be changed after creation pretty easily too.
Once set up you'll get the option to see the DNS for the endpoints. If you've set this up in three subnets like me, you'll get three DNS entries for each AZ. Make a note of these you'll only need one but just make a note in case you want to get extra fancy.
Its very easy to set this up which means the tricky bit must be in Kubernetes right?
This isn't actually that hard once you understand a couple of concepts. Firstly you're going to need a Persistent Volume, a PV. This is basically Kubernetes interface to a disk, in this case to an NFS mount. I've included a config below which sets up, I named the file persitentvol-1b.yaml:
apiVersion: v1 kind: PersistentVolume metadata: name: MyPersistantEFSVol spec: capacity: storage: 100Gi accessModes: - ReadWriteMany nfs: server: eu-west-1b.fs-75HJfj6j.efs.eu-west-1.amazonaws.com path: "/"
Lets have a look at this file. The two key things are capacity:storage and nfs:server these allow us set size limits and connect to the correct mount. Note that EFS will scale on demand and you pay for what you use space wise, so the storage directive allows you to control the limits on this (although I've not tested this is a hard limit yet). The server directive is simply one of the DNS entries you saved earlier whilst setting up EFS. You can see I'm connecting to an endpoint in eu-west-1b hence the file name.
In order to connect this up its simply a case of running the following command:
kubectl create -f persistentvol-1b.yaml
Claim Your Volume
Once Kubernetes knows about this Persistent Volume we need to allow the containers/pods too use this space. This is why we create a Claim, a PVC. The Claim is what we are going to call later on when we create a pod. Thanks to @imac for this tip. The file below persistentvol-claim.yaml show how to set this up:
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: MyPersistantEFSVol namespace: mynamespace spec: accessModes: - ReadWriteMany resources: requests: storage: 10Gi
One thing to note in this file is that I've used a namespace as my container is going to live in that namespace also. PV's don't need a namespace but PVC's do if you are starting your container from somewhere other than default. To start this yet again its a simple command:
kubectl create -f persistentvol-clain.yaml
Using the Volume
For this demo I'm going to set up a replication controller, initially with just one pod/container. The reason for this is I'll show you how to scale later and test that your data is shared. The yaml below will set up a container for you with no initial code. I've called the file www-app.yaml:
apiVersion: v1 kind: ReplicationController metadata: namespace: mynamespace name: www-app labels: www-component: app spec: replicas: 1 selector: www-component: app template: metadata: labels: www-component: app spec: containers: - name: www-app image: richarvey/nginx-php-fpm:latest imagePullPolicy: Always ports: - containerPort: 80 volumeMounts: - mountPath: "/var/www/html" name: webroot volumes: - name: webroot persistentVolumeClaim: claimName: MyPersistantEFSVol
To start the Replication Controller and check its running these commands will help you:
kubectl create -f www-app.yaml kubectl get pods,rc --namespace mynamespace
Add Some Data
Now lets connect into your pod and add some data to /var/www/html (which is the default webroot for this container)
kubectl exec -it NAME_OF_YOUR_CONTAINER bash --namespace mynamespace
Once you've connected into your container create some files on /var/www/html using you favorite editor (vi). I recommend creating a index.php file. A simple test would be:
<?PHP phpinfo(); ?>
Lets Scale Thing Up
Now we have this file you should be able to see this once you browse to the container in your web browser. You can use kubectl inspect to find your containers IP.
The following command will add more replica's (containers) to your replication controller and these will all share the same Persistent Volume Claim and therefor the same data.
kubectl scale rc/www-app --replicas=3 --namespace mynamespace kubectl get pods --namespace mynamespace
You'll now see you have three containers. Use kubectl inspect to get the other IP addresses and use your browser to visit them. You'll see that the file you created on the first container is displayed. You'll also notice that editing this file in any of the containers works seamlessly and changes are reflected in all your browser windows.
NFS mounts in most Linux distro's allow you to specify a fail over mount endpoint. I'm not sure if this is supported in Kubernetes yet but in theory if/when it is we could use one of the other EFS endpoints to make this highly available.
If you would like any help with the topics covered in this article please don't hesitate to contact us on firstname.lastname@example.org