Confident deployments in Docker Swarm
How we filled holes in the Docker ecosystem
Like many companies, Issuu has embraced containerization in our development and deployment process. This area is dominated by Docker, therefore it is no surprise that we use Docker as well. There is enough discussion on the pros and cons of Docker but today I want to discuss a different aspect of using Docker containers — Docker Swarm.
Docker Swarm
Docker Swarm can be thought of similar to the Docker Engine you’re using on your local machine but scaled to a cluster of machines. Therefore Swarm is very convenient because it requires little to no changes to the containers that you have experience in building already to have them running in the cloud.
The terminology is a bit confusing at first so here’s a short rundown of the terms and what they mean:
-
Swarm is the entirety of your cloud deployment: the workers, the management nodes etc. You deploy to a swarm and then the swarm takes care on which machines your containers will end up running.
-
Service is the Swarm equivalent of a container. It represents one single container, running on one or more machines.
-
Stack is the Swarm equivalent of Docker Compose. It represents a number of services, linked up to each other and deployed using a single configuration.
Where we started
When we started using Docker Swarm we didn’t yet have the experience on how to best utilize the system, so we started with the simplest solution. Since a service is like a container running on multiple hosts, let’s just start with this.
docker -H swarmhost service create \
--name my-service \
--detach false \
image-name:build_number
This was quite easy, right? The detach
option is very useful, as it allows
your calling process to wait until Docker considers the service to be done
deploying. After it has finished you can call a command to make sure your
deployment worked and your service is running.
Unfortunately, reality is a bit more messy than than, so you usually end up having to configure a few more things.
docker -H swarmhost service create \
--name my-service \
--mount type=bind,source=/dev/log,target=/dev/log \
--detach false \
--replicas 8 \
--update-delay 10s \
--update-parallelism 2 \
--update-failure-action rollback \
--restart-max-attempts 1000 \
--restart-window 1h \
--limit-cpu 1 \
--reserve-memory 256m \
--limit-memory 1g \
--log-driver syslog \
--log-opt syslog-facility=local5 \
--log-opt tag=my-service \
image-name:build_number \
my_command
Okay, that required a few more options, but these are set once, right? Right?
Unfortunately not, since this is only the create
command, so it only works if
the service you’re attempting to create does not already exist. A simple
workaround would be to delete the service and recreate it but then you throw
away zero downtime deployments.
Okay, so let’s also write the update statement:
docker -H swarmhost service update \
--detach false \
--replicas 8 \
--update-delay 10s \
--update-parallelism 2 \
--update-failure-action rollback \
--restart-max-attempts 1000 \
--restart-window 1h \
--limit-cpu 1 \
--reserve-memory 256m \
--limit-memory 1g \
--log-driver syslog \
--log-opt syslog-facility=local5 \
--log-opt tag=my-service \
--image image-name:build_number \
--args my_command \
my_service
Duplication everywhere
What you’ll notice here is that the options are mostly the same, but not
quite: you need to specify --args
and --image
now, and don’t use --name
anymore. Another annoying issue is that --mount
cannot be set, it has to be
altered with --mount-add
and --mount-remove
instead, but you’d need to know
the current state of the mounts to know whether to add it or remove it.
Therefore your deployment step would need to be quite smart about what is
currently deployed and what isn’t.
Also, what happens if you want to deploy multiple services? Adding them is
reasonably easy, but for removing services that you don’t use anymore you can’t
just remove the docker service
incantations from your deployment script and
expect them to be gone. You have to either go in manually and remove services
or make your deployment process even smarter to track the set of services that
should be deployed versus the set of services that are deployed.
If only there was a way to tell Docker exactly what you want your services to look like. As it turns out, there is!
A potential cure
Imagine our excitement when we saw that all this complexity can just go away
because of Docker Stack! With Docker Stack services can be declared in a
docker-compose.yml
file: which containers to deploy, what images, which mount
point, how many instances, how much memory and CPUs to assign. It can also
track which services you’ve removed from your docker-compose.yml
and
automatically remove your unused services.
So our example config looks roughly like this:
version: '3.3'
services:
service:
image: 'image-name:build-number'
command: ['my_command']
logging:
driver: syslog
options:
syslog-facility: local5
tag: my_service
volumes:
- type: bind
source: /dev/log
target: /dev/log
deploy:
replicas: 8
resources:
limits:
cpus: "1"
memory: 1gb
reservations:
memory: 256mb
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
restart_policy:
max_attempts: 1000
window: 1h
From here the deployment is pretty simple:
docker -H swarmhost \
stack deploy \
--prune \
--compose-file docker-compose.yml \
my-stack
This defines a stack which my-stack
which contains one service. Stacks aren’t
magical in Docker, this stack creates a my-stack_service
(the format is
<stack-name>_<service-name>
) service which can then be inspected with the
regular tools that are available for services.
The biggest advantage of the stack is that stack deploy
is used to define how
the stack is supposed to look like and created or modified accordingly to match
the docker-compose.yml
. No need to instruct Docker on what the changes to the
services should be, but rather how they should be configured.
A nice side-effect of declaring it this way is that this format is commonly
known among Docker users. New hires will be able to understand and modify
docker-compose.yml
files way easier than an complicated in-house deployment
system with dozens of docker
calls.
Confident?
You might have noticed that there is no --detach false
option anymore, which
will cause the docker
command to exit immediately (there exists a bug
report to implement it). This poses
a problem, since if the command exits immediately it is not possible to just
check whether the deployment was successful immediately afterwards, as the
deployment may not have finished yet. Therefore checking whether your
deployment finished successfully or not suddenly becomes quite a problem. You
don’t want your stack deploy to be rolled back without being notified about it.
In the worst case your code might not have been deployed in ages since all
deploys were rolled back without you noticing.
Obviously this is not a state that we were happy about, therefore we decided to bite the bullet and write some tooling to help us with that.
Designing a solution
One of the learnings we have at Issuu is that if you don’t specifically need
to write a custom wrapper do not write a wrapper. This is what made us decide
against writing a very smart build system (since docker stack
can do it for
us instead) and this is why we don’t want to implement our own stack deploy
,
because we value using standard components in the ways they were meant to be
used.
Therefore we decided to write a tool to just specifically address the issues we
had with stack deploy
, namely waiting for a stack to be finished deploying
(be it successful or a rollback) and determining whether what currently is
deployed matches what we expect to be deployed and signalize failure otherwise.
Enter sure-deploy, a tool we released as free software for everybody to use!
Confident deployments
The way sure-deploy
works is overall very simple: it polls the instances in a
stack on their UpdateStatus
field (a field in the JSON that docker swarm
outputs) and waits for all services to either converge into a finished status
or a rolled back status. Thus it emulates the --detach false
option of
docker service
but on docker swarm
.
This brings us to the point where we know that the deployment is finished, but
we still don’t know whether the deployment was successful or has been rolled
back. Which is why sure-deploy
has a second flag to verify whether the status
of the stack matches the docker-compose.yml
, thus detecting whether the
preceding deployment was successful or rolled back.
Since the tool would need to run in multiple environments, be it local development machines or build environments we wanted to have a tool that is easy to get to work and would have as little moving parts as possible — preferably a single binary. We also wanted it to start up quickly, so users wouldn’t be frustrated by long waiting times to launch VMs or download dependencies.
Fortunately one of the languages we’re using in our team, OCaml ticks these boxes: native binaries, fast startup. We also get a great environment to write parsers in (a feature which was useful to implement the Docker-specific YAML template variables) as well as a powerful static type system that helped us to make sure we’re covering all kinds of failure cases.
To avoid having to install the binaries on all build hosts and make it simpler
to use for other teams, we also created a small Docker image (11 MB at the
moment of writing) containing pre-built binaries of sure-deploy
which can be
used like so:
docker run \
--rm \
--mount type=bind,src=$(pwd)/docker-compose.yml,dst=/home/opam/docker-compose.yml,readonly \
issuu/sure-deploy:54 --help
The only complicated part of this command is the boilerplate required to pass the
docker-compose.yml
file into the container.
Caveats
While in general the tool is very helpful to know that if a deployment is “green” this also means that it in fact is correctly deployed there are some caveats to be aware of that we do not have much control of:
- The initial creation of a stack makes services which do not have a
UpdateStatus
field, so we have the choice of either accepting it as successfully created or failed. We decided to consider this a failure since we prefer to err on being too cautious rather than inspiring false confidence. This is by far the most frustrating issue but there is no good solution to this unless Docker will includeUpdateStatus
in initial deployments. - The check functionality only compares the
image
fields ofdocker-compose.yml
and the deployed image. We assume that the update of the service configuration always works correctly. Since each of our deployments also increases the tag of the image, a successful update of the image also means that the configuration has been successfully applied, so it is less of an issue in practice. docker stack
supports less of the templating syntax thandocker-compose
, default values are not supported.sure-deploy
supports the complete set thatdocker-compose
does. More humblebrag than caveat but we put a lot of effort in doing the right thing.
Conclusions
This was a long post to describe our way how we went from starting off with new tech to a solution that we’re happy with, so here’s what we recommend to do:
- Write a
docker-compose.yml
with the configuration of the services you want to deploy. This is perfect to be stored in your source repository. - Deploy your containers with
docker stack
. - Use
sure-deploy converge
to wait for the deployment to finish. - Use
sure-deploy verify
to make sure that your deployment was successful and matches what yourdocker-compose.yml
describes.