How to Deploy to Several Orchestrators and Be Happy (or Not)

5 min readOct 1, 2019

In the article “The Power of Abstraction” we discussed why we need to make our deployment manifest for applications

I remember the moment when I started advising ANNA Money, and at that point, there was a process to deploy an application to Mesos. There was an Ansible playbook for templating manifests for Marathon and Chronos. It is a cool idea not to write manifests for each environment and use templates. However, something was broken in that process. Let’s look at the standard manifest:

sanction_app:
  id: "{{sanction_id|d('sanction')}}"
  cmd: python /app/sanction/app.py --with-swagger
  container:
    docker:
      image: "{{ sanction_image }}"
      forcePullImage: true
      network: HOST
    type: DOCKER
  requirePorts: "{{require_ports}}"
  ports:
   - "{{sanction_host_port}}"
  env:
   POSTGRES_PORT: "{{ postgres_port }}"
   POSTGRES_HOST: "{{ postgres_host }}"
   POSTGRES_USER: "{{ sanction_postgres_user }}"
   POSTGRES_DBNAME: "{{ sanction_postgres_dbname }}"
   LOGSTASH_HOST: "{{ logstash_host }}"
   LOGSTASH_PORT: "{{ logstash_port }}"
   LOGSTASH_TAG: "{{ sanction_logstash_tag }}"
  cpus: "{{sanction_cpu}}"
  mem: "{{sanction_mem}}"
  instances: "{{sanction_instances}}"
  healthChecks:
    - gracePeriodSeconds: 300
      intervalSeconds: 20
      timeoutSeconds: 20
      maxConsecutiveFailures: 3
      portIndex: 0
      path: /healthcheck
      protocol: HTTP
  labels:
      HAPROXY_GROUP: internal
  upgradeStrategy:
      minimumHealthCapacity: "{{ minimum_health_capacity }}"
      maximumOverCapacity: "{{ maximum_over_capacity }}"

At first sight, everything is ok. We have a template. We have some variables that are passed via the Ansible engine. If you have one or two or maybe three services, you could manage each template manually and change nothing.

On the other hand, we had several dozens of services, and we had a problem with manifest management, with boilerplate, with standardization.

Moreover, this was not the end. There were manifests for database migrations, for batch jobs, and they looked like identical twins.

When I saw these templates, I realized that a broad approach would bury us all alive if we want to make a significant infrastructure shift. We needed to simplify the services definition, and that was OneYAML conception born.

So, OneYAML is a simple definition in YAML format that contains all necessary properties for templating deployment manifest or any other artifact manifest. Even more, any developer could control deployment flow via this definition. We decided that manifest should be as small as possible but have many options to control deployment by developers.

It should be simple but powerful. It should contain all the necessary things inside but they should not be very long.

Around 20 lines to describe an application instead of more than three different files 20+ lines each:

application:
  app_name: sanction
  app_port: 8019  cmd_app_line_internal: python /app/sanction/app.py --with-swagger  test:
    cpu: 0.1
    mem: 64.0
    instances: 2  prod:
    cpu: 0.2
    mem: 128.0
    instances: 4
  
  #db_specific
  postgres: True
  migration_task_name: sanction_DB_MIGRATION

How we did that?

First of all, we picked Ansible to template different Mesos artifacts by OneYAML definition:

We could make our manifest concise, short, and elegant because in Ansible, you could override variable on various stages and use a variable value to construct another variable:

For example, for logstash tag we could use {{ app_name }}-{{ env_name }} value by default. However, for the production environment, we could override it to {{ app_name }} . Because we have only one production environment instead of multiple test environments. We leave an opportunity for the developer to break the rules and provide our value in OneYAML if it is needed for a serious reason.

This principle is often called Convention Over Configuration, and we use it to standardize deployment, reduce copy-paste, and improve maintainability:

As you see developer could not care about things that do not matter.

The next step was to wrap ansible command with a shorter and simpler shell script:

./deploy-app.sh -e test hcn:1.0.0//real command
ansible-playbook anna-deploy-playbook.yml --limit "intra" -u deploy -e "{ env: test }" --extra-vars "@anna-hcn.yml" -e app_version=1.0.0 --diff

So, we got the following scheme:

So at that moment we started to think that mesos for us is just a backend, only a part, just a one-piece, and we could replace it with Kubernetes.:

Kubernetes artifacts are just derivatives from OneYAML (as Mesos):

so we could quickly generate and apply them:

For deploy to Kubernetes we made another shell script that wraps ansible command and adds engine variable in the launch string:

./deploy-k8s-app.sh -e test hcn:1.0.0//real command
ansible-playbook anna-deploy-playbook.yml --limit "intra" -u deploy -e "{ env: test }" --extra-vars "@anna-hcn.yml" -e app_version=1.0.0 --diff -e engine=k8s

The most common features in the Mesos ecosystem have a robust analog in K8S, and there is no problem in making another implementation.

But the devil is always in the details.

We faced some problems with supporting two different ecosystems at the same time.

The first problem is service discovery. We used marathon-lb and bound applications to ports. So we had option app_port in our definition to control it. However, in K8S we use services for deployments and could discover them by hostname like hcn-service:8000 . We passed these values via environment variables and should keep them in the actual state for both implementations.

The second problem was similar. To make service accessible from the outer world, we should use marathon-lb with haproxy for Mesos ecosystem and Ingress with Nginx for K8S. They are different, so we duplicate values for both of them during the migration.

application:
  ...
  haproxy_group: external
  haproxy_0_path: "-i /api"
  
  ...
  ingress: True
  ingress_path: "/api"

I think we can try to eliminate duplications, but we should pay more price for it. We did not do that because we plan to destroy Mesos infrastructure after the migration.

The third problem was about batch jobs. Chronos tasks have their schedule format and less strict restrictions to job names that Cron Jobs in K8S:

application:
  ...
  chronos: True
  chronos_tasks:
    - name: "{{ 'sync-customers' if k8s else 'ACCOUNT_SYNC_CUSTOMERS_TO_XXX' }}"
    cron: "30 0/1 * * *"
    schedule: R//PT30M
    command: python /app/account/sync.py

We were deploying applications for half of the year into several orchestrators during the migration from Mesos to K8S. We could do that because we had our deployment manifest format and own deployment engine based on Ansible. So we will easily do that again when the moment comes.

How to Deploy to Several Orchestrators and Be Happy (or Not)

Written by Tarasov Aleksandr

No responses yet