Rolling Updates with Docker and Ansible

Wealthsimple's production environment is powered by hundreds of Docker containers. We have lots of reasons to like containers beyond just their popularity:

  • Containers give us a homogenous way to deploy heterogenous apps
  • Our servers can all be configured uniformly since there are no app dependencies to install
  • We can choose (or even change) our Linux distro without breaking the application's runtime environment since the latter ships with the container

Missing from the equation is a container scheduler (coming in 2018). What we do have is an orchestration Swiss Army Knife in the form of Ansible, and we’ve gotten pretty good at using it to deploy containers! In this post we’ll describe how we use Ansible’s docker_container module to perform zero downtime deployments of our apps.

Docker as an API Server

It may come as a surprise to Docker initiates that the docker CLI’s real job is to make API calls against a daemon! If you're comfortable speaking JSON all day you can even forego the CLI and roll your own tools (this is foreshadowing). Try using curl or wget to ask the daemon about itself via its socket:

# curl -Ss --unix-socket /var/run/docker.sock http:/version | python -m json.tool
{
    "ApiVersion": "1.29",
    "Arch": "amd64",
    "BuildTime": "2017-05-04T22:06:25.279181930+00:00",
    "GitCommit": "89658be",
    "GoVersion": "go1.7.5",
    "KernelVersion": "3.10.0-514.26.2.el7.x86_64",
    "MinAPIVersion": "1.12",
    "Os": "linux",
    "Version": "17.05.0-ce"
}

The daemonized approach is sometimes levied as a criticism against Docker (fairly, in our opinion), but it sure does allow for some great integrations. For example: instead of using Ansible’s command module to execute docker run commands, the docker_container module is a first class citizen that can submit (and more importantly receive) very detailed information about containers using the daemon’s API server.

Hello World

What kind of very detailed information? Let’s use Ansible to start a Docker container the with the command module so we have a baseline to compare against. Here’s a barebones playbook to do just that:

---

- hosts: all
  tasks:
    - name: 'Start a container with docker run'
      command: 'docker run -d --name my-redis redis:4-alpine'
      register: results

    - name: 'Output container id'
      debug:
        var: results.stdout

And the output:

# ansible-playbook -i "localhost," -c local containers.yml

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [Start a container with docker run] ***************************************
changed: [localhost]

TASK [Output container id] *****************************************************
ok: [localhost] => {
    "results.stdout": "1d8e7621d47db2491de6ea7e4d84cb166fc86bb25fd77e65ec4f5c159f637023"
}

PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=1    unreachable=0    failed=0

Ok, not bad! Armed with the container ID, we could use docker inspect and parse the results to learn more. We could do that but what would be even better is to capture that information the moment we start the container. Here’s another playbook that starts a container, this time using the docker_container module:

---

- hosts: all
  tasks:
    - name: 'Start a container with docker_container module'
      docker_container:
        name: 'my-redis'
        image: 'redis:4-alpine'
        detach: True
      register: results

    - name: 'Output container results'
      debug:
        msg: "{{ item }}"
      with_items:
        - "{{ results.ansible_facts.docker_container.Id }}"
        - "{{ results.ansible_facts.docker_container.State.Status }}"
        - "{{ results.ansible_facts.docker_container.NetworkSettings.Networks.bridge.IPAddress }}"

Here's the output:

# ansible-playbook -i "localhost," -c local containers.yml

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [Start a container with docker_container module] **************************
ok: [localhost]

TASK [Output container results] ************************************************
ok: [localhost] => (item=f4f460e4a345a40222f0a3b3b859ae451e6f69e7b17e3752955f442b1b9cf928) => {
    "item": "f4f460e4a345a40222f0a3b3b859ae451e6f69e7b17e3752955f442b1b9cf928",
    "msg": "f4f460e4a345a40222f0a3b3b859ae451e6f69e7b17e3752955f442b1b9cf928"
}
ok: [localhost] => (item=running) => {
    "item": "running",
    "msg": "running"
}
ok: [localhost] => (item=172.17.0.2) => {
    "item": "172.17.0.2",
    "msg": "172.17.0.2"
}

PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0

If you want to blot out your screen, output the entire results variable to see just how much information the daemon returns! It’s everything you could ever want (and maybe some that you don't) and you can extract it all from the registered variable.

There’s a price of admission for using Ansible’s docker_container module though, which is that all containers must be named. The Docker CLI will randomly pick a name for you if you use docker run to avoid duplicate names, but Ansible’s docker_container module doesn’t try to avoid conflict. If you deploy a container with a name that’s already in use (such as to deploy a new version of an app), Ansible will unceremoniously whack the old container. This playbook will give you downtime the second time you run it with a different app_version:

---

- hosts: all
  tasks:
    - name: 'Start or replace a container with docker_container module'
      docker_container:
        name: 'web-container'
        image: "my-app:{{ app_version }}"
        detach: True
        ports:
          - '8080:8080'

Another problem is that two containers running on the same host cannot bind to the same port! No problem though: with the above play, your old container will be turfed before the new one starts anyway. This is a charitable definition of a “solved” problem as you’ll be serving 50x errors for a little while until the replacement container starts (if it starts).

A common pattern in AWS is to bind a container to static port, proxy to that static port with Nginx, and then put the whole instance (and its friends) behind an Elastic Load Balancer or Application Load Balancer. How else to get traffic to the container, especially if you cut Nginx out of the equation altogether? In truth we don’t need a static port; what we really need is a way to know what port the container will bind to. If we knew the port number, it wouldn’t matter if it were static or not.

A Practical Example

Here’s an approximation of how we use the docker_container module inside an Ansible role (I’ve removed and simplified some things to keep things focused). We pass two dictionary parameters: params which is a global namespace for the microservice, regardless of whether we’re running a web container, worker, or cron-like container, and app which is a container-specific namespace:

- name: "Start web container with {{ app.cmd }}"
  docker_container:
    name: "{{ params.consts.app_name }}-{{ ansible_date_time.epoch }}"
    image: "{{ params.consts.image_name }}:{{ params.consts.image_tag }}"
    labels:
      app: "{{ params.consts.app_name }}"
      role: 'web'
    command: "{{ app.cmd }}"
    state: 'started'
    restart_policy: 'on-failure'
    ports:
      - "{{ app.port }}"
  register: docker_results

Lots going on here. The first thing we do is to ensure that each container has a unique name on its host machine by bolting on a unix time stamp. This means that if my-app-1512692884 was already running when engineering pushes a new version, the new container will be named my-app-1512696443 preventing Ansible from knocking over the old container (which is still servicing traffic, incidentally). We also append some labels, which will be useful later on.

When we pass the application’s port to docker_container we’re only specifying what port the microservice inside the container is listening on. We don’t specify what port the host should bind to, so Docker will helpfully pick and provide us with an unused ephemeral port at runtime that won’t conflict with the previously-running container. Next we retrieve the container ID and ephemeral port, and load them into more manageable variable names:

- name: 'Retrieve container ID and ephemeral port'
  set_fact:
    container_id: "{{ docker_results.ansible_facts.docker_container.Id }}"
    ephemeral_port: "{{ docker_results.ansible_facts.docker_container.NetworkSettings.Ports[app.port + '/tcp'][0].HostPort }}"

You may notice a shortcoming of this: we’re only trying to keep track of one port. If your container exposes multiple ports then you’ll have to get creative. We like running one service on one port per container, so this works out well for us.

Next we wait for the app inside the container to reach a healthy status:

- block:
    - name: 'Verify that the new container is serving traffic'
      uri:
        url: "http://localhost:{{ ephemeral_port }}{{ app.health_check_uri }}"
        status_code: 200
      register: container_status
      retries: "{{ app.health_retries | default('20') }}"
      delay: "{{ app.health_retry_delay | default('2') }}"
      until: "container_status.status == 200"
  rescue:
    - name: 'Terminate failed container'
      shell: "docker stop {{ container_id }} ; docker rm --force {{ container_id }}"
      ignore_errors: 'yes'

    - fail:
        msg: 'New container failed to return HTTP 200 and has been terminated'

Once we receive HTTP 200 we move on. If a bug in our app or playbook means that the container will never be healthy, then we terminate it and halt the deployment (some other cleanup actions happen, but that’s outside the scope of today’s post).

Now we have two useful things: a healthy container ready to receive traffic, and ephemeral_port! When we initially began our containerization project, we used this value to re-write an Nginx configuration to proxy traffic from a known port (the ELB) to the container’s ephemeral only-just-learned-about-it port. More recently we use this to register the service with Hashicorp’s Consul so that we no longer have to install Nginx everywhere or use a million ELBs. With a port in hand, you now have lots of ways to solve this problem!

Cleanup

PagerDuty’s SRE team helpfully shared lessons from their own container journey at DevOpsDays Toronto this year just as Wealthsimple was beginning its own containerization work. This has heavily influenced our roadmap, including the decision to rollout containers and container scheduling as separate projects. After the presentation I asked Mia how they went about deploying containers without downtime, which was something we didn’t have a good answer for at the time. Her team’s solution (paraphrased) was “We start the new containers and then shut off the old ones two minutes later.”

Wealthsimple microservices have a much shorter timeout, so it’s a fairly safe assumption that traffic will drain from the old container in a minute at most from the point that we reload Nginx or republish to Consul (as an aside, most assumptions are unsafe, including this one, so this is really just about what we feel is an acceptable risk). This brings us to the next step in our Ansible role, which is to find the old containers which are now on the chopping block:

- name: 'Retrieve container IDs of previous web versions'
  command: "docker ps -qa --no-trunc --filter 'label=app={{ params.consts.app_name }}' --filter 'label=role=web'"
  register: previous_containers

Those labels we applied earlier come in handy now! This will give us a return-delimited list of app containers matching the microservice we’re deploying. On worker instances in particular, this helps us filter out cron workers so that we’re not terminating them mid-deploy. Note that this list will include the container that we just deployed, but we already have that registered as container_id so we can use a when condition to avoid terminating it:

- name: 'Queue previous containers to terminate in 2 minutes'
  at:
    command: "docker rm --force {{ item }}"
    count: 2
    units: 'minutes'
  with_items: "{{ previous_containers.stdout_lines }}"
  when: (item.find(container_id) == -1)

Our when condition will cause us to skip terminating the newly-deployed container, while forcefully terminating the old ones asynchronously. Ansible’s at module uses the atd package to schedule a command in the near future. The above task seems to schedule it 2 minutes in advance, but this is not completely true; atd rounds to the nearest minute, so scheduling something to happen in “1” minute at 20:34:55 means that the command will execute at 20:35:00 — a bit too soon. By setting count: 2 we ensure that the container will terminate in at least 1 minute, and at most 2.

While atd isn’t the solution to all of your problems, we use it in this case so that Ansible isn’t blocked for 1-2 minutes in the middle of a deploy while waiting for traffic to drain. Instead we schedule the container's termination and move on.

Recap

We covered quite a bit!

  1. How the docker_container can be more powerful than the command module for containers
  2. A couple gotchas for the docker_container module
  3. How you can use data returned by docker_container to gracefully cutover traffic

If your organization is deploying apps directly to servers and you've heard that Kubernetes is all the rage, why not start by packaging and deploying containers? Once you get the hang of it, watch some Kelsey Hightower videos and try out some schedulers. Join us again in 2018 when Wealthsimple's infrastructure team will have some thoughts on that too!