Docker and Data

Home » Docker and Data

One of the frustrations I have with trying new technologies is when the documentation covers a specific operating system, especially when it combines another technology that requires a certain level of understanding. I had that recently with a new product where the Docker commands for creating a container were given, with no explanation about them. It took me a while to work out that the key part was linking the data in the container to a local folder. But I had battled with trying to do that with Docker for Windows some months ago, unsuccessfully.

First off, let’s give a bit of background on Docker and data. Docker images are a specific version of the software, so for Domino that might be Domino V10.0.1 or for Node-RED it might be 19.6. If you install that software on your PC, upgrading will be done by installing the new software release on the same PC or maybe uninstalling and installing the new version. That’s not the case with Docker. Technically you might be able to go into your container and manually install the new version of the software. But typically, you’ll delete the container and create a new container from the image for the updated version.

That raises some questions about the data. If the container is specifically for demos or tests, you don’t care about the data. Having the data in the container is an advantage – you get the same data every time you start up. For some products like Swagger Editor, the Docker container is just a web IDE and the menu options let you load a file to edit: the file you’re editing never goes into the container. For others that are just a web IDE, the data will be settings and extensions. You could get away with just backing those up centrally and re-loading them into the new version Docker container, and that could be preferable in case you accidentally break it. But for others like Node-RED and particularly Domino, the persisted data is very important.

There are two options: a volume or a bind mount.

A volume is Docker data repository. You create it with docker volume create and there are various other commands available. It’s not something you can access via the host operating system (i.e. Windows Explorer, if you’re running Windows). Instead, you access it via a container that uses it. Multiple containers can use the container, so you can create a container specifically to backup your Docker volume, as outlined on StackOverflow. I used a volume in the Domino on Docker video. Guido Diepen has set up an extremely useful GitHub repo with some useful scripts related to Docker volumes, one for getting detailed information from volumes and one for cloning a volume. Cloning a volume is extremely useful if you want to test a new version of the software: instead of sharing your existing Docker volume to the new version container and potentially impacting use from the old version container, you can clone the volume and map the clone to the new version container. This has a significant advantage of being able to regression test – you know everything is the same except the software version. So if something unexpected occurs, it’s definitely the software, not data-related.

A bind mount, on the other hand, is a mapping to a folder on the host operating system – a physical folder you can manage from outside a Docker container, incuding when Docker isn’t running. This has an advantage for personal use in a container. But of course if you’re creating the container via a script, it would assume everyone using the script has the same folder structure, which they might not. In that case, a volume is probably better. Using a volume also keeps things self-contained, restricting access to the host operating system.

You can find out more in the Docker documentation. Although it seems they used to created differently, they can now be managed in the same way. Both can now be managed via --mount or --volume, though --volume allows you to combine your options in a single property.  --volume is the verbose method, but the typical is the short -v-v typically takes two fields, separated with a colon. The first is the volume name or reference to the folder on the host operating system, the second is the path to the folder directory within the container that you want to map to. So, whenever you navigate to that folder directory from within the container, you’re actually jumping across to the volume or folder on the host operating system. For data that’s in a volume, if the container is running, you can also use the docker cp command to copy files in or out – by navigating to that folder directory within the container.

Of course with a bind mount, it’s much easier to manage the data, because you can just use the host file manager, e.g. Windows Explorer. The problem is that for months I couldn’t get a bind mount to work in Docker for Windows. The documentation works fine for Unix operating systems – Linux or MacOS. As documented and as typically referenced in Docker images, ${pwd} maps to the current working directory. Typically you’re creating the Docker container from the command line, so that’s the directory your command line prompt is currently in.

But that doesn’t work in Windows. I finally came across the right answer on StackOverflow. From a standard command prompt in Windows, you use %cd% instead to map to the current directory. I use Cmder or Visual Studio Code to run command prompts, and from those I also have the option to start Powershell instead. In Powershell you use ${PWD} – the upper case format is critical.

This now allows you to map a local directory as your data.

1 thought on “Docker and Data”

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top