Using Docker and Dusty for Development

When I first proposed using Docker for development, and began doing my work that way, there were some doubts.

Doesn’t it seem like a lot of trouble to set up Docker to get my work done?
Isn’t it easier to use Homebrew to install the services and database servers I need?

At Artsy, our main API aka Gravity uses MongoDB, Solr, Elasticsearch, and memcached. In development, we use Mailcatcher so we can view emails. When a new software engineer starts, that person studies a big Getting Started document, and spends most of a day to get everything installed and configured. Not only do they need to get the software installed, figuring out all of the environment variables that need to be set up can take some time too. While we have good documentation, it is still a tedious and repetitive process that takes up the time of our new employee, and more experienced developers who need to answer questions.

Now that Gravity has been dockerized, getting set up consists of a one-time install of Docker Toolbox followed by running

docker-compose build && docker-compose up

in the root directory of the checked-out repo. Here is a simplified version of our docker-compose setup. Because we run a web server and a delayed_job process, docker-compose.yml uses a common.yml file for shared setup:

gravity:
  build: .
  environment:
    MEMCACHE_SERVERS: memcached
    SOLR_URL: http://solr4:8983/solr/gravity
    MONGO_HOST: mongodb
    ELASTICSEARCH_URL: elasticsearch:9200
    SMTP_PORT: 1025
    SMTP_ADDRESS: mailcatcher
  env_file: .env

The .env file is used for secrets such as Amazon Web Services credentials we don’t want to put into the git repository.

Our docker-compose.yml looks like this:

mongodb:
  image: mongo:2.4
  command: bash -c "rm -f /data/db/mongod.lock; mongod --smallfiles --quiet --logpath=/dev/null"
  ports: 
  - "27017:27017"

solr4:
  image: artsy/solr4

memcached:
  image: memcached

elasticsearch:
  image: artsy/elasticsearch
  ports:
  - "9200:9200"
  - "9300:9300"

web:
  extends:
    file: common.yml
    service: gravity
  command: script/rails s -b 0.0.0.0 -p 80
  ports:
  - "80:80"
  volumes:
  - .:/app
  links:
  - elasticsearch
  - mongodb
  - memcached
  - solr4
  - mailcatcher

dj:
  extends:
    file: common.yml
    service: gravity
  command: bundle exec rake jobs:work
  volumes:
  - .:/app
  links:
  - elasticsearch
  - mongodb
  - memcached
  - solr4

mailcatcher:
  image: zolweb/docker-mailcatcher
  ports:
  - "1080:1080"

The command for the MongoDB section removes a lock file that can remain in place sometimes when the container is killed. Do not use that in production! We mount the local directory into the container with a volumes: command, so that local changes are reloaded in the running containers.

Recently, Ashkan Nasseri began to move our delayed jobs from delayed_job_mongoid to sidekiq, which brings in Redis and another process that needs to run during development. Since we are using Docker, all we have to do is add a couple of new sections to our docker-compose.yml file:

redis:
  image: redis
  ports:
  - "6379:6379"

sidekiq:
  extends: 
    file: common.yml
    service: gravity
  command: bundle exec sidekiq
  volumes:
  - .:/app
  links:
  - elasticsearch
  - mongodb
  - memcached
  - solr4
  - redis

and add this line to common.yml:

REDIS_URL: redis://redis

The next time someone runs docker-compose up, this will cause a one-time download of a redis image, and then it brings up the additional sidekiq service.

For development which involves multiple applications in separate git repositories, we use Dusty, which was created by GameChanger. Some of the advantages of using Dusty include the use of NFS (which performs much better than shared volumes in VirtualBox), and a built-in nginx proxy along with modifications to your /etc/hosts file so that you can more easily connect to your applications.

With Dusty, you set up services, apps, and bundles of apps with YAML files. Here is a repo with sample Dusty specs.

Our MongoDB service is defined as:

# services/mongo2.yml
image: mongo:2.4
volumes:
- /persist/persistentMongo:/data/db
entrypoint: ["sh", "-c", "rm -f /data/db/mongod.lock; mongod --smallfiles --quiet --logpath=/dev/null"]
ports:
- "27017:27017"

It’s not necessary to expose the ports, but in case we want to connect directly to the MongoDB instance with the mongo command without shelling into a container, we need it to be available on our Docker VM’s IP address.

Our Gravity app’s Dusty YAML file is:

# apps/gravity.yml
repo: github.com/artsy/gravity
mount: /app
build: .

depends:
  services:
  - mongo2
  - memcached
  - solr4
  - es15
  - mailcatcher
  - redis
  apps:
  - radiation

host_forwarding:
- host_name: gravity
  host_port: 80
  container_port: 80

compose:
  environment:
    RADIATION_URL: http://radiation
    MONGO_HOST: mongo2
    MEMCACHE_SERVERS: memcached
    SOLR_URL: http://solr4:8983/solr/gravity
    ELASTICSEARCH_URL: es15:9200
    SMTP_ADDRESS: mailcatcher
    SMTP_PORT: 1025
    REDIS_URL: redis://redis

commands:
  once:
  - bundle install -j 10
  - bundle exec rake db:client_applications:create_all
  - bundle exec rake db:admin:create
  always:
  - rails s -b 0.0.0.0 -p 80

The depends: configuration is similar to the links functionality of docker-compose. It makes sure that those applications (as defined in apps/*.yml) are running, and sets up /etc/hosts in the containers to allow your applications to refer to other services using their hostnames.

For now, Dusty doesn’t have a way of sharing common setup like common.yml above, so there are similar configurations for our Sidekiq and Delayed Job workers.

Dusty uses bundles for clusters of applications that need to work together. An example bundle, for a CMS application that needs many APIs, is:

# apps/volt.yml
description: Volt
apps:
  - tangentApi
  - radiation
  - superposition
  - gravity
  - volt

We bring up that cluster of applications with

dusty bundles activate volt
dusty up

As we have added new services over time, using Docker and Dusty to bring clusters of apps together has made it much easier for developers to work on projects without having to spend time on installation and configuration. Having Docker configuration in the repo also serves as good (and up-to-date) documentation of how a given application is configured and its dependencies. It is also much less resource-intensive compared to using virtual machines configured with Vagrant or another provisioning tool. All of our Docker applications and services can run in a single VM. If you are developing on Linux, you don’t even need a VM!

We are also starting to use Docker to run integrated testing across multiple applications using Selenium. That will be covered in a future blog post.