docker data only containers for vendor code

Docker has this idea of “data only containers“. The concept is generally pretty simple. You have a container which contains the data and exposes it via the --volume CLI or VOLUME directive in the Dockerfile. Another container can then access that volume via the --volumes-from command line switch.

One of the interesting things is that the data only container does not need to be running. It can be started and just exit. As long as the container was created, then the volume will be available. Although if you clean up your “exited” containers, you will likely also delete the data only container since it has exited.

In most of the descriptions online there are lots of examples of exposing data sets this way. For example a /var/lib/mysql directory. That can be used to get a consistent data set to run tests against or making setup easier. There are also other examples of where this makes life easier.

As we started playing with docker, one thing was noticable. Images can get big and so they can take a bit of time to ship them around. With some consideration around the structure of the build process, this can be mitigated due to Docker’s use of caching in the process.

None the less, this brought us to another potential use of the data only container: vendor code.

For example, we have a tool we use, which is about 130MB in size. So far we’ve been baking it into the application container. However since this code rarely changes it’s a great candidate to be split into it’s own container.

We’ve been experimenting with the idea and created an image just for that code and then linking to the volume from the application.

So far it’s working rather well. Here is an example.

Say we have our code in the a vendor/ directory and need to access it from the application under /opt/vendor.

The docker build starts with the busybox base image and simply copies the vendor code to the image under /opt/vendor.

Starting the container is easy:

# you don't need the --volume if your Dockerfile expose the volume
docker run -d --volume /opt/vendor --name vendor_name vendor-image

In our case the CMD or run command is simple:

echo "Data only container for access to vendor code"

That means that the container starts, prints that line and exits. That’s good enough to access the data.

Now the application container is started with

docker run -d --volumes-from vendor_name application-image

The --volumes-from will grab the volumes from the “vendor_name” container and bring them into the application container at the same mount point.

We now have the entire vendor code base available in the application container without having to bake it into that image.

One thing of note is that there are two choices in starting the data only container. If you start with an empty container based on the scratch image, the run command will likely fail unless your vendor code let’s run something easily. In our case the code base requires a lot of supporting stuff and the container simply can’t even do an echo successfully. That’s the first choice: put a CMD in your Dockerfile and know that it will always fail on start, but start the container none the less. (Hint: if you just leave the CMD off, it will not start).

The other choice is to start with a slightly bigger base image such as busybox. That’s what we have done and the 2.5MB extra seems worth it to avoid the failure.