Confident deployments in Docker Swarm
How we filled holes in the Docker ecosystem
Like many companies, Issuu has embraced containerization in our development and deployment process. This area is dominated by Docker, therefore it is no surprise that we use Docker as well. There is enough discussion on the pros and cons of Docker but today I want to discuss a different aspect of using Docker containers — Docker Swarm.
Docker Swarm can be thought of similar to the Docker Engine you're using on your local machine but scaled to a cluster of machines. Therefore Swarm is very convenient because it requires little to no changes to the containers that you have experience in building already to have them running in the cloud.
The terminology is a bit confusing at first so here's a short rundown of the terms and what they mean:
Swarm is the entirety of your cloud deployment: the workers, the management nodes etc. You deploy to a swarm and then the swarm takes care on which machines your containers will end up running.
Service is the Swarm equivalent of a container. It represents one single container, running on one or more machines.
Stack is the Swarm equivalent of Docker Compose. It represents a number of services, linked up to each other and deployed using a single configuration.
Where we started
When we started using Docker Swarm we didn't yet have the experience on how to best utilize the system, so we started with the simplest solution. Since a service is like a container running on multiple hosts, let's just start with this.
docker -H swarmhost service create \ --name my-service \ --detach false \ image-name:build_number
This was quite easy, right? The
detach option is very useful, as it allows
your calling process to wait until Docker considers the service to be done
deploying. After it has finished you can call a command to make sure your
deployment worked and your service is running.
Unfortunately, reality is a bit more messy than than, so you usually end up having to configure a few more things.
docker -H swarmhost service create \ --name my-service \ --mount type=bind,source=/dev/log,target=/dev/log \ --detach false \ --replicas 8 \ --update-delay 10s \ --update-parallelism 2 \ --update-failure-action rollback \ --restart-max-attempts 1000 \ --restart-window 1h \ --limit-cpu 1 \ --reserve-memory 256m \ --limit-memory 1g \ --log-driver syslog \ --log-opt syslog-facility=local5 \ --log-opt tag=my-service \ image-name:build_number \ my_command
Okay, that required a few more options, but these are set once, right? Right?
Unfortunately not, since this is only the
create command, so it only works if
the service you're attempting to create does not already exist. A simple
workaround would be to delete the service and recreate it but then you throw
away zero downtime deployments.
Okay, so let's also write the update statement:
docker -H swarmhost service update \ --detach false \ --replicas 8 \ --update-delay 10s \ --update-parallelism 2 \ --update-failure-action rollback \ --restart-max-attempts 1000 \ --restart-window 1h \ --limit-cpu 1 \ --reserve-memory 256m \ --limit-memory 1g \ --log-driver syslog \ --log-opt syslog-facility=local5 \ --log-opt tag=my-service \ --image image-name:build_number \ --args my_command \ my_service
What you'll notice here is that the options are mostly the same, but not
quite: you need to specify
--image now, and don't use
anymore. Another annoying issue is that
--mount cannot be set, it has to be
--mount-remove instead, but you'd need to know
the current state of the mounts to know whether to add it or remove it.
Therefore your deployment step would need to be quite smart about what is
currently deployed and what isn't.
Also, what happens if you want to deploy multiple services? Adding them is
reasonably easy, but for removing services that you don't use anymore you can't
just remove the
docker service incantations from your deployment script and
expect them to be gone. You have to either go in manually and remove services
or make your deployment process even smarter to track the set of services that
should be deployed versus the set of services that are deployed.
If only there was a way to tell Docker exactly what you want your services to look like. As it turns out, there is!
A potential cure
Imagine our excitement when we saw that all this complexity can just go away
because of Docker Stack! With Docker Stack services can be declared in a
docker-compose.yml file: which containers to deploy, what images, which mount
point, how many instances, how much memory and CPUs to assign. It can also
track which services you've removed from your
automatically remove your unused services.
So our example config looks roughly like this:
version: '3.3' services: service: image: 'image-name:build-number' command: ['my_command'] logging: driver: syslog options: syslog-facility: local5 tag: my_service volumes: - type: bind source: /dev/log target: /dev/log deploy: replicas: 8 resources: limits: cpus: "1" memory: 1gb reservations: memory: 256mb update_config: parallelism: 1 delay: 10s failure_action: rollback restart_policy: max_attempts: 1000 window: 1h
From here the deployment is pretty simple:
docker -H swarmhost \ stack deploy \ --prune \ --compose-file docker-compose.yml \ my-stack
This defines a stack which
my-stack which contains one service. Stacks aren't
magical in Docker, this stack creates a
my-stack_service (the format is
<stack-name>_<service-name>) service which can then be inspected with the
regular tools that are available for services.
The biggest advantage of the stack is that
stack deploy is used to define how
the stack is supposed to look like and created or modified accordingly to match
docker-compose.yml. No need to instruct Docker on what the changes to the
services should be, but rather how they should be configured.
A nice side-effect of declaring it this way is that this format is commonly
known among Docker users. New hires will be able to understand and modify
docker-compose.yml files way easier than an complicated in-house deployment
system with dozens of
You might have noticed that there is no
--detach false option anymore, which
will cause the
docker command to exit immediately (there exists a bug
report to implement it). This poses
a problem, since if the command exits immediately it is not possible to just
check whether the deployment was successful immediately afterwards, as the
deployment may not have finished yet. Therefore checking whether your
deployment finished successfully or not suddenly becomes quite a problem. You
don't want your stack deploy to be rolled back without being notified about it.
In the worst case your code might not have been deployed in ages since all
deploys were rolled back without you noticing.
Obviously this is not a state that we were happy about, therefore we decided to bite the bullet and write some tooling to help us with that.
Designing a solution
One of the learnings we have at Issuu is that if you don't specifically need
to write a custom wrapper do not write a wrapper. This is what made us decide
against writing a very smart build system (since
docker stack can do it for
us instead) and this is why we don't want to implement our own
because we value using standard components in the ways they were meant to be
Therefore we decided to write a tool to just specifically address the issues we
stack deploy, namely waiting for a stack to be finished deploying
(be it successful or a rollback) and determining whether what currently is
deployed matches what we expect to be deployed and signalize failure otherwise.
Enter sure-deploy, a tool we released as free software for everybody to use!
sure-deploy works is overall very simple: it polls the instances in a
stack on their
UpdateStatus field (a field in the JSON that
outputs) and waits for all services to either converge into a finished status
or a rolled back status. Thus it emulates the
--detach false option of
docker service but on
This brings us to the point where we know that the deployment is finished, but
we still don't know whether the deployment was successful or has been rolled
back. Which is why
sure-deploy has a second flag to verify whether the status
of the stack matches the
docker-compose.yml, thus detecting whether the
preceding deployment was successful or rolled back.
Since the tool would need to run in multiple environments, be it local development machines or build environments we wanted to have a tool that is easy to get to work and would have as little moving parts as possible — preferably a single binary. We also wanted it to start up quickly, so users wouldn't be frustrated by long waiting times to launch VMs or download dependencies.
Fortunately one of the languages we're using in our team, OCaml ticks these boxes: native binaries, fast startup. We also get a great environment to write parsers in (a feature which was useful to implement the Docker-specific YAML template variables) as well as a powerful static type system that helped us to make sure we're covering all kinds of failure cases.
To avoid having to install the binaries on all build hosts and make it simpler
to use for other teams, we also created a small Docker image (11 MB at the
moment of writing) containing pre-built binaries of
sure-deploy which can be
used like so:
docker run \ --rm \ --mount type=bind,src=$(pwd)/docker-compose.yml,dst=/home/opam/docker-compose.yml,readonly \ issuu/sure-deploy:54 --help
The only complicated part of this command is the boilerplate required to pass the
docker-compose.yml file into the container.
While in general the tool is very helpful to know that if a deployment is "green" this also means that it in fact is correctly deployed there are some caveats to be aware of that we do not have much control of:
- The initial creation of a stack makes services which do not have a
UpdateStatusfield, so we have the choice of either accepting it as successfully created or failed. We decided to consider this a failure since we prefer to err on being too cautious rather than inspiring false confidence. This is by far the most frustrating issue but there is no good solution to this unless Docker will include
UpdateStatusin initial deployments.
- The check functionality only compares the
docker-compose.ymland the deployed image. We assume that the update of the service configuration always works correctly. Since each of our deployments also increases the tag of the image, a successful update of the image also means that the configuration has been successfully applied, so it is less of an issue in practice.
docker stacksupports less of the templating syntax than
docker-compose, default values are not supported.
sure-deploysupports the complete set that
docker-composedoes. More humblebrag than caveat but we put a lot of effort in doing the right thing.
This was a long post to describe our way how we went from starting off with new tech to a solution that we're happy with, so here's what we recommend to do:
- Write a
docker-compose.ymlwith the configuration of the services you want to deploy. This is perfect to be stored in your source repository.
- Deploy your containers with
sure-deploy convergeto wait for the deployment to finish.
sure-deploy verifyto make sure that your deployment was successful and matches what your