More Docker Recommendations

There are many intricacies about Docker which take time to find out and benefit from. Here are a few of the more valuable outtakes from my journey to learning and using Docker effectively. These aren't obvious from looking at the source code of existing Docker setups published on Docker Hub or GitHub. But they are useful to know when maintaining your own images.

Attempt One Process per Docker Container

The official Docker documentation now only suggests running a single process per Docker container:

Each container should have only one concern. Decoupling applications into multiple containers makes it easier to scale horizontally and reuse containers.

-- Source

And no wonder. Docker is so widespread, developers use it for extremely varied purposes. Putting "large" processes in separate Docker images was and remains a good idea. A textbook example is running an HTTP server along with a database as separate Docker containers. These two perform very different tasks and must communicate using a thin layer.

However, running multiple containers comes at a cost of performance and maintenance. Therefore, some Docker images spawn sub-processes inside containers to execute small tasks.

In another case, running multiple processes in a single Docker container can streamline the setup of the development environment for a project. One of the simplest ways to achieve this in NodeJS is with concurrently. Say we need to continuously rebuild a website from sources and test it in a web browser. concurrently "npm run build:watch" "npm run server" could rebuild the project on every change while simultaneously running a HTTP server to also show the results of those changes. There's no need for these two processes to run inside separate containers as they're never executed in production.

Skip Marking Ports with `EXPOSE`

The EXPOSE statements in a Dockerfile indicate ports which are going to be bound by running Docker containers based on it. It doesn't affect how Docker runs them. EXPOSE'd ports are printed by the docker inspect and other commands and can make it easier for developers to use a Docker image and debug any issues. That being said, my approach is to skip EXPOSE inside Dockerfile and include all used ports in the documentation instead (often a README.md file). Doing so maximizes the reach of this information, making it easier for those unfamiliar with Docker, who don't know how a Dockerfile works, to still use it.

Practice Version Pinning

Modern software projects often rely on tens or even hundreds of third party libraries. Those libraries are constantly updated, oftentimes braking the software that relies on them. Using the most up to date libraries is a good idea. It has security, performance, stability and many other upsides. However, what you never want is to wake up one day to a slew of errors due to mismatched library versions and have to urgently fix them just so your Dockerfile can keep running as it did previously. You want to plan library updates and do one at a time. To achieve that, ensure Docker always pulls the same versions of libraries, i.e. version pinning.

How to achieve this varies from tool to tool. Here are some examples.

# ./Dockerfile

# Pinned base Docker image
FROM ubuntu:18.04

# Pinned apt-get (APT) library
RUN apt-get update && apt-get install -y \
  nano=5.4-2

# Pinned NPM package
RUN npm run axios@0.24.0

Avoid Configuration Drift

A related topic to version pinning is configuration drift. It's explained in detail in Effective DevOps: Building a Culture of Collaboration, Affinity and Tooling at Scale by Jeniffer Davis et al. The book describes a variety of approaches to properly maintain web services over time. Handling configuration drift is one of the strategies.

Configuration drift is the phenomenon where servers will change or drift away from their desired configuration over time.

-- Source

Many automation tools are susceptible to configuration drift, like Puppet and Jenkins. In Docker, it's caused by changing the contents of a running container instead of the image it's based on and making the container difficult or impossible to reproduce. Docker images should contain everything needed to run a service. The benefit of such images is they always provide a unified experience no matter how they're run - locally, in a cluster (ex. AWS ECS) or during continuous integration (CI).

While the risk of drift can be never completely removed, there are ways to mitigate it:

Version pinning (described earlier).
Switching to a non-root and less-privileged user when running a container by adding a USER statement in the Dockerfile.
Using the same image for development, staging, production and CI or at least sharing the critical parts by means of multi-stage builds.
Automatically and constantly deploying changes to Docker images in small increments so there's no room for fine-grained and ephemeral changes inside containers by developers and system admins.
Starting containers with the --read-only flag to disallow most filesystem changes in the container.

Exclude Sensitive Information

During docker build . the entire source directory is copied as the context for the Dockerfile to use. Running COPY and ADD takes files from this context and puts them inside the image being built. For larger projects, this has a significant impact on build times. Especially when having to send the context through the internet to some remote docker client. An even worse outcome of a large context is Docker copying all the secrets from dotfiles. For instance, NextJS uses files starting with .env for this purpose. To prevent this information from getting exposed, add .dockerignore to the project. It must list rules preventing secrets from getting included in the context to take effect. The syntax for this file is similar to .eslintignore and .gitignore:

.env.*
logs
build
.cache

Automate Builds

Even if Docker Compose is not required in a project, it may be a convenient way of running and building images. The following docker-compose.yml file replaces all the parameters used in docker build -f Dockerfile -t somecompany/someimage .:

version: '3.4'

services:
  dev:
    image: somecompany/someimage
    build:
      dockerfile: './Dockerfile'
      context: .

Now, it's enough to run docker-compose build to achieve the same result. This approach is sometimes called configuration-as-code. On the whole, Docker Compose is as capable as Docker when running and building images. If you decide not to go this route, make sure to educate developers on how to use Docker in the project with explicit instructions in the documentation attached to it.

Prevent Data Corruption

When multiple docker containers write to the same volume there's a chance they'll step on each other's toes and write to the same target file leading to data corruption. Make sure that's impossible by giving only one of those containers write access to the data or by somehow "locking" the data for a moment. Docker and Docker Compose allow fine-grained control over access to volumes while mounting them to containers.

If you're looking for more tips, start with the official Docker documentation.