Artsy Engineering

Did you know that Netflix has hundreds of API versions, one for each device? Daniel Jacobson’s Techniques for Scaling the Netflix API at QConSF 2011 explained why they chose this model. And while we don’t all build distributed services that supply custom-tailored data to thousands of heterogeneous TVs and set-top boxes, we do have to pay close attention to API versioning from day one.

Versioning is hard. Your data models evolve, but you must maintain backward-compatibility for your public interfaces. While many strategies exist to deal with this problem, we’d like to propose one that requires very little programming effort and that is more declarative in nature.

At Artsy we use Grape and implement the “path” versioning strategy from the frontier branch. Our initial v1 API is consumed by our own website and services and lives at https://artsyapi.com/api/v1. We’ve also prototyped v2 and by the time v1 is frozen, it should already be in production.

Grape takes care of version-based routing and has a system that lets you split version-based presentation of a model from the model implementation. I find that separation forcefully induced by unnecessary implementation complexity around wanting to return different JSON depending on the API version requested. What if implementing versioning in as_json were super simple?

Consider a Person model returned from a v1 API.

class API < Grape::API
  prefix :api
  version :v1
  namespace :person
    get ":id"
      Person.find(params[:id]).as_json
    end
  end
end

class Person
  include Mongoid::Document

  field :name

  def as_json
    {
      name: name
    }
  end

end

In v2 the model split :name into a :first and :last name and in v3 :name has finally been deprecated. A version v3 Person model would look as follows.

class Person
  include Mongoid::Document

  field :first
  field :last

  def as_json
    {
      first: first,
      last: last
    }
  end

end

How can we combine these two implementations and write Person.find(params[:id]).as_json({ :version => ? })?

In mongoid-cached-json we’ve introduced a declarative way of versioning JSON. Here’s the code for Person v3.

class Person
  include Mongoid::Document
  include Mongoid::CachedJson

  field :first
  field :last

  def name
    [ first, last ].join(" ")
  end

  json_fields \
    name: { :versions => [ :v1, :v2 ] },
    first: { :versions => [ :v2, :v3 ] },
    last: { :versions => [ :v2, :v3 ] }

end

With the mongoid-cached-json gem you also get caching that respects JSON versioning, for free. Read about it here.

How To Redirect Bang Hash Urls

Sometimes you type a hash-bang URL too fast, bang first.

Consider https://artsy.net/!#/log_in. Rails will receive /! as the file path, resulting in a 404, File Not Found error. The part of the URL after the hash is a position within the page and is never sent to the web server.

It’s actually pretty easy to handle this scenario and redirect to the corresponding hash-bang URL.

The most straightforward way is to create a file called !.html in your public folder and use JavaScript to rewrite the URL with the bang-hash.

``` html public/!.html

Click here if you're not redirected ...

You can also do this inside a controller with a view or layout. Start by trapping the URL in your `ApplicationController`.

``` ruby app/controllers/application_controller.rb
if request.env['PATH_INFO'] == '/!'
  render layout: "bang_hash"
  return
end

The layout can have the piece of JavaScript that redirects to the corresponding hash-bang URL.

``` ruby app/views/layouts/bang_hash.html.haml !!!

ie_tag(:html) do %body :javascript window.location = ‘/#!’ + window.location.hash.substring(1) ```

10x Rack and Rails Output Compression with Rack::Deflater

You can quickly reduce the amount of data transferred from your Rack or Rails application with Rack::Deflater. Anecdotal evidence shows a reduction from a 50Kb JSON response into about 6Kb. It may be a huge deal for your mobile clients.

For a Rails application, modify config/application.rb or config/environment.rb.

``` ruby config/application.rb Acme::Application.configure do config.middleware.use Rack::Deflater end

For a Rack application, add the middleware in config.ru.

``` ruby config.ru
use Rack::Deflater
run Acme::Instance

Caching Model JSON with Mongoid-Cached-Json

Consider the following two Mongoid domain models, Widget and Gadget.

``` ruby widget.rb class Widget include Mongoid::Document

field :name has_many :gadgets end

``` ruby gadget.rb
class Gadget
  include Mongoid::Document

  field :name
  field :extras

  belongs_to :widget
end

And an API call that returns a collection of widgets.

get 'widgets' do
  Widget.all.as_json
end

Given many widgets, the API makes a subquery to fetch the corresponding gadgets for each widget.

Introducing mongoid-cached-json. This library mitigates several frequent problems with such code.

Adds a declarative way of specifying a subset of fields to be returned part of as_json.
Avoids a large amount of subqueries by caching document JSONs participating in the parent-child relationship.
Provides a consistent strategy for restricting child documents’ fields from being returned via the parent JSON.

Using Mongoid::CachedJson we were able to cut our JSON API average response time by about a factor of 10. Find it on Github.

Reliably Testing Asynchronous UI w/ RSpec and Capybara

tl;dr - You can write 632 rock solid UI tests with Capybara and RSpec, too.

/images/2012-02-03-reliably-testing-asynchronous-ui-w-slash-rspec-and-capybara/jenkins-ci.png

We have exactly 231 integration tests and 401 view tests out of a total of 3086 in our core application today. This adds up to 632 tests that exercise UI. The vast majority use RSpec with Capybara and Selenium. This means that every time the suite runs we set up real data in a local MongoDB and use a real browser to hit a fully running local application, 632 times. The suite currently takes 45 minutes to run headless on a slow Linode, UI tests taking more than half the time.

While the site is in private beta, you can get a glimpse of the complexity of the UI from the splash page. It’s a rich client-side Javascript application that talks to an API. You can open your browser’s developer tools and watch a combination of API calls and many asynchronous events.

Keeping the UI tests reliable is notoriously difficult. For the longest time we felt depressed under the Pacific Northwest -like weather of our Jenkins CI and blamed every possible combination of code and infrastructure for the many intermittent failures. We’ve gone on sprees of marking many such tests “pending” too.

We’ve learned a lot and stabilized our test suite. This is how we do UI testing.

Beyond Heroku: "Satellite" Delayed Job Workers on EC2

Joey Aghion

@joeyAghion

[TL;DR: To supplement Heroku-managed app servers, we launched custom EC2 instances to host Delayed Job worker processes. See the satellite_setup github repo for rake tasks and Chef recipes that make it easy.]

Artsy engineers are big users and abusers of Heroku. It’s a neat abstraction of server resources, so we were conflicted when parts of our application started to bump into Heroku’s limitations. While we weren’t eager to start managing additional infrastructure, we found that–with a few good tools–we could migrate some components away from Heroku without fragmenting the codebase or over-complicating our development environments.

There are a number of reasons your app might need to go beyond Heroku. It might rely on a locally installed tool (not possible on Heroku’s locked-down servers), or require heavy file-system usage (limited to tmp/ and log/, and not permanent or shared). In our case, the culprit was Heroku’s 512 MB RAM limit–reasonable for most web processes, but quickly exceeded by the image-processing tasks of our delayed_job workers. We considered building a specialized image-processing service, but decided instead to supplement our web apps with a custom EC2 instance dedicated to processing background tasks. We call these servers “satellites.”

We’ll walk through the pertinent sections here, but you can find Rake tasks that correspond with these scripts, plus all of the necessary cookbooks, in the satellite_setup github repo. Now, on to the code!

First, generate a key-pair from Amazon’s AWS Management Console. Then we’ll use Fog to spawn the EC2 instance.

require 'fog'

# Update these values according to your environment...
S3_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX'
S3_SECRET_ACCESS_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
KEY_NAME = 'satellite_keypair'
KEY_PATH = "#{ENV['HOME']}/.ssh/#{KEY_NAME}.pem"
IMAGE_ID = 'ami-c162a9a8'  # 64-bit Ubuntu 11.10
FLAVOR_ID = 'm1.large'

connection = Fog::Compute.new(provider: 'AWS',
  aws_access_key_id: S3_ACCESS_KEY_ID,
  aws_secret_access_key: S3_SECRET_ACCESS_KEY)

server = connection.servers.bootstrap(
  key_name: KEY_NAME,
  private_key_path: KEY_PATH,
  image_id: IMAGE_ID,
  flavor_id: FLAVOR_ID)

Next, we’ll do some basic server prep and install our preferred Ruby version.

Delaying CarrierWave Image Processing

We do a lot of image processing at Artsy. We have tens of thousands of beautiful original high resolution images from our partners and treat them with care. The files mostly come from professional art photographers, include embedded color profiles and other complicated features that make image processing a big deal.

Once uploaded, these images are converted to JPG, resized into many versions and often resampled. We are using CarrierWave for this process - our typical image uploader starts like a usual CarrierWave implementation with a few additional features.

How Artsy Uses GitHub to Build Artsy