Rosalind and The Art Genome Project

We recently bade a fond farewell to two of the champions of our “open source by default” ethos, but this spirit lives on at Artsy.

Today we open-source one of our key metadata apps, and explain how it fits into Artsy’s ecosystem.

Open source by default… or at least eventually

Even when we don’t start a project by building in the open, we encourage ourselves to question why that is so. Can we go ahead and open things up? If the answer is no (and it might be), are we clear on why not?

We’ve even added a light bit of process to promote this kind of questioning. Our robots will periodically trawl our GitHub org and file an issue against any closed-source repo that doesn’t include a rationale for its closed nature in the project README.

Peril issue — Robot would like to have a word with you

One of our repos got this treatment recently. I’m the point person on this project, and while I considered adding a rationale of “primary author will die of acute impostor syndrome if this repo is open-sourced, he will just stop living 😅,” instead we’ve gone ahead and made this repo public.

Meet Rosalind, an admin app for large-batch genoming operations on Artsy’s database of artworks.

The Art Genome Project, and the Genome Team

Rosalind is a close cousin of Helix, our original dedicated genoming app. (That project is private for now, but was described by Sarah in an an earlier blog post.)

What’s genoming, you ask, and what’s it got to do with art?

Artsy’s discovery and recommendation capabilites are powered in large part by The Art Genome Project, a comprehensive system of classification that uses our homegrown art-historical controlled vocabulary to describe the artists and artworks in our database. (Here’s an explainer and our full list of categories. You can even view a structured data export of the “genes” which make up our controlled vocabulary.)

For a long time, our team of art historians — the Genome Team — bore sole responsibility for applying this vocabulary, using Helix, to the artworks that entered our platform. This was always a daunting task, and only became more so as our network of partner galleries and institutions continued to grow and upload more artworks.

Artsy is now home to over 1,000,000 artworks by over 100,000 artists, described by over 1,000 genes. A few years ago, it became clear to us that in order to continue applying high-quality metadata at scale, we were going to need some new processes and some new tools.

On the process side, we decided to share a simplified genoming interface with our gallery partners, so that they could start contributing the metadata that would be most relevant to our audience of collectors. We called it “Partner Applied Categories.”

Partner applied categories — Partner Applied Categories interface from Artsy’s partner CMS, showing choices for works of photography

Note that this is a tiny subset of our full genome vocabulary, and that values are applied as on/off, versus the more nuanced 0-100 score that our own Genome Team would apply.

This was a good step, a fundamental building block in scaling artwork metadata on our platform, but it created new problems as well as new opportunities.

This is where Rosalind came in.

About Rosalind

We started work on this tool in earnest in early 2017, at a time when we felt an internal need for the ability to, among other things:

Perform boolean searches against our database of artworks using The Art Genome Project’s vocabulary – a general purpose superpower that would be useful for our art historians, our editorial team, and our collector relations and marketing teams, among others
Make large-batch modifications to artworks’ genomes (whether genomed by us, or by our gallery partners via Partner Applied Categories), in order to maintain metadata quality

The user experience we built looks something like this:

Rosalind session — A sample admin interaction in Rosalind

Under the hood this is a Rails 5 application that talks to our core API server and our Elasticsearch cluster on the backend, and serves up a React single page app on the frontend.

Rosalind architecture — Bird’s eye view of Rosalind’s architecture

Among the tactical goals of this project were to support heavy-duty admin workflows by:

offering a featureful interface tailored to power users
adding a useful complement of keyboard navigation capabilities
making it as blazingly fast as possible

That last goal led to the somewhat quirky decision to have Rosalind talk to the Elasticsearch cluster directly, instead of making a search request to our core API server.

While this entails some risk of drift from our accumulated search best practices as encapsulated in our API, in reality this has not been a problem yet. Rosalind’s search needs are fairly straightforward and using Elasticsearch’s REST API has been working out just fine. And it is fast as heck.

Although the project was mostly built out in early 2017, we’ve periodically revisited it for upkeep and maintenance. Along the way we’ve ridden some of the larger trends in the React+Rails ecosystem, as well as internal trends.

We started out on webpack-rails, an early pioneer in nudging Rails away from the Asset Pipeline provided by Sprockets, and toward Webpack
We migrated over to the Webpacker gem once that became a core Rails concern
We migrated from RSpec feature specs to Rails system tests, after that was rolled out
We experimented with adopting Reaction, our shared internal React component library
We more recently switched to using Palette, our nascent design system library

After being in maintenance mode for a couple of years, we’re excited to start up new feature work again, and plan to bring even more bits of the Artsy Omakase into this project. Maybe this project will even be a good proving ground for prettier-ruby.

Whatever happens, Rosalind will continue to play an important role in maintaining Artsy’s high-quality metadata and in making Artsy the best online destination for art.