Reflections on Docker-based development

Note that these reflections are specifically tailored to a conversation about Docker we're having at 18F, and as such they have a few assumptions:

All developers use OS X.
Production deployments will be done via cloud.gov.
Developers have access to AWS sandbox accounts.

Advantages

Dev/prod parity ... sort of

Unlike vagrant, a big promise of Docker is that it's not just intended for development purposes--it's also intended for deployment, because containers are so good for process isolation and resource management. This means that we can ideally have great dev/prod parity in theory. In practice, things are a bit more complicated, especially since cloud.gov currently calls its Docker support an experimental feature.

Dev/CI parity

While we don't do it in CALC, using Docker containers on Travis CI (not sure about CircleCI) is easy and I've done it before. This makes it particularly easy to ensure dev/CI parity, so you don't have tests that work on your local machines but mysteriously fail in CI.

Obviates an entire class of environment tooling

Most development tools are built to assume that they are a "singleton" in the content of the system they're installed on.

But as soon as you have two projects that require different versions (or configurations) of that tool, you start needing another tool that manages the version or configuration of that tool for you, so that each project can use the version or configuration that it needs. This is how tools like nvm (for node), rvm (for ruby), virtualenv/pyenv (for python) and such come into existence. It adds a lot of cognitive overhead to the development process.

Containers get rid of this problem entirely--but not without introducing new cognitive overhead that developers need to understand. At least one benefit of the new overhead, though, is it's generic enough to apply to all kinds of problems, rather than being specialized to a particular type of development tool.

Reduces setup time

Installing Docker on OS X is easy, and as the CALC docker instructions attest, setup largely boils down to git clone followed by docker-compose up, peppered with a few manual tasks.

Without Docker, the more dependent services your project has, the harder it's generally going to be for someone to configure and start up a development environment. With Docker, this often isn't the case: as we've added redis, worker and scheduler processes to CALC, developers haven't had to change their environment, because docker-compose up does everything for them.

Another nice thing about docker-compose up is that it starts all services in a single terminal window and prefixes their output with their container name. This is already a lot more convenient than manually opening a separate terminal window for every dependent service, which is what non-Docker setups often make developers do.

Ease of one-off deployment to the cloud

Because we're not allowed to use tools like ngrok to expose our development instances to coworkers at 18F, being able to conveniently deploy our work to a temporary Amazon EC2 instance becomes important. Fortunately, thanks to docker-machine, this isn't hard; see CALC's guide to deploying to cloud environments for more details.

It's self-documenting

Once one learns the fairly straightforward syntax of dockerfiles and docker compose files, Dockerfile and docker-compose.yml become handy "recipes" on how to reliably set up a development (or even production) environment from scratch. So even if one decides not to use Docker, they can still consult those files to figure out how everything is configured and connected.

This transparency also means that it wouldn't be too hard for us to migrate from Docker to a different containerization technology, if that ever becomes a need. It's the opposite of vendor lock-in.

Adding new developer tooling is easy

The incredibly low cost of adding new containers to a docker-based project means that it becomes very easy to add new developer tooling. For example, in CALC, we were able to trivially add mailcatcher support during development, despite the fact that it's ruby-based (and CALC is a Python project).

Challenges

Exposing ports is complicated and confusing

Argh, I have had so many problems with this and I still barely understand it well enough to explain it to others. There's the EXPOSE directive in dockerfiles, the ports directive in docker compose files, and then the fact that those ports aren't even exposed when you run docker-compose run instead of docker-compose up, unless you pass the --service-ports option to it... It's not hard in theory, but it's an annoying chunk of cognitive overhead that one simply doesn't need to deal with when they're not using Docker/docker-compose.

Changing dependencies can be cumbersome

Because of the way that dockerfiles work, and the fact that containers are so ephemeral, it can potentially be quite annoying to change the dependencies in one's projects. Often just editing a requirements.txt or package.json file, as one would do in a non-Docker environment, causes all the packages listed in said file to be re-retrieved from the internet and installed, which starts taking a very long time once you've got lots of dependencies.

There's various ways to work around this; for instance, if I'm just tinkering with a dependency, I'll temporarily add a RUN npm install foo to the end of my Dockerfile. We're discussing this more in-depth in CALC#1230 but the point is that it's something that can become non-trivial once you move to Docker, and that can be annoying.

Running Selenium/WebDriver tests can be really complicated

See CALC's selenium.md for more details.

Knowing when/whether to re-run "docker-compose build" can be confusing

This is something that our non-technical folks in particular frequently get tripped up on, but more experienced devs can too. The safest approach is simply to re-run docker-compose build every time you git pull but this can be hard to remember.

Then again, though, non-Dockerized environments have parallel problems: CALC devs who aren't using Docker actually have to run multiple commands like pip install -r requirements.txt and npm install to make sure their environment stays up-to-date, so perhaps Dockerization is still a net win in this regard.

References

Atul's January 2017 Docker presentation
- GitHub
- Slides
- Video (coming soon!)

toolness/docker-thoughts.md