A lot of the user-facing focus is on being able to present interfaces with a quality worthy of art.
What you see today when you go to www.artsy.net is a website built with Ezel.js, which is a boilerplate for Backbone projects running on Node and using Express and Browserify. We used to have separate projects for mobile and desktop web, but they are now merged. The combined app is hosted on Heroku and uses Redis for caching. Assets, including artwork images, are served from Amazon S3 via the CloudFront CDN. This code is open-source.
What you see today when you open the Artsy iOS app is a mix of Objective-C, Swift and React Native. Objective-C and Swift continue to provide a lot of over-arching cross-View Controller code. While individual representations of Artsy resources tend to be built in React Native. All of our React Native code uses Relay to handle API integration. This code is open-source.
You can also find Artsy on Alexa and Google Home, which are both open-source Node.js applications. There is also an open-source Apple TV app built in Swift.
Our core API serves the public facets of our product, many of our own internal applications, and even some of your own projects. It's built with Ruby, Rack, Rails, and Grape serving primarily JSON. The API is hosted on AWS OpsWorks and retrieves data from several MongoDB databases hosted with Compose. It also uses Memcached for caching and Redis for background queues with Sidekiq. It runs background jobs with delayed_job. We used to employ Apache Solr and even Google Custom Search for the many search functions, but have since consolidated on Elasticsearch.
Most modern code for both the website and the iOS app use an orchestration layer which is powered by GraphQL to streamline their data fetching and reduce front-end complexity. Our GraphQL server is an Express app, using express-graphql to provide a single API end-point. The API does not access our data directly, but forwards requests to the core API or other services. We have been migrating shared display logic into the GraphQL server, to make it easier to build consistent clients. This code is open-source.
We continue to have a public HAL+JSON API for external developers. This API is in active use for contemporary production services inside Artsy and the website is open-source, too.
The vast customer-facing business is powered by a Content Management System (CMS) for gallery and institutional partners. This CMS lets them upload and manage gallery shows, fair booths, create artists, and edit artwork metadata. All CMS components talk to our core API. We also have a number of CMS-like internal applications to manage partners, auctions, art genomes, configuring fairs or performing recurrent billing (we use Stripe for storing and charging credit cards and ACH) with invoicing.
CMS applications are based on stable, mature technologies like Rails, Bootstrap, Turbolinks and CoffeeScript, and gradually adopts modern client-side technologies like React and Browserify. They share a lot of common infrastructure.
We have a generic image-processing service in-house, which uses Rails, Sidekiq, Redis, and RMagick with ImageMagick. It receives image processing requests from many Artsy applications and generates thumbnails, tiles and watermarks images on S3.
Collectors inquire on artworks and engage in conversations with partners. For this purpose we have built a generic messaging system that manages communications between different parties. It receives messages via API or e-mail, finds or creates a conversation based on the recipients and forwards them to the proper addresses in that conversation. Its doesn't assume anything about the contents of the messages, which makes it a generic system for any type of conversation. The conversations surface to our partners via CMS.
The Auctions business began with doing the occasional benefit auctions for charities. Most of these auctions are online-only, timed sales. The initial version of our auction systems came together before we began our move to microservices, and so it is baked into our core API. Last year, we launched a live auction integration product to allow users to bid on works at commercial sales at the actual auction house sale rooms. The real-time requirements of this system required a rethinking of how we process our bids.
A more detailed overview of the Auctions stack can be found in The Tech Behind Live Auction Integration.
Our in-house editorial team and partners use an open-source platform called "Writer" (which we've built) to publish rich content across the web. Writer is split in two parts: the editorial-focused CMS and a JSON API that stores and distributes content separately from the rest of Artsy's stack.
Writer's frontend is built with Ezel.js, which is a boilerplate for Backbone projects running on Node and using Express and Browserify. We also heavily use React and write in CoffeeScript. Writer's backend exposes REST-based and GraphQL APIs that are consumed by our applications.
You can see Writer being put to work when you see articles on www.artsy.net, Facebook Instant Articles, Google AMP, RSS, Apple News, and email. We handle the distribution and display in all of these channels. We also support brand sponsorship deals and produce front-end heavy projects such as Year in Art 2016, and Year in Art 2015.
Data generally flows from consumer applications and services into AWS RedShift. We use a set of rake tasks run on Jenkins to move data from our several MongoDB and PostgreSQL databases to Redshift via S3. These rake tasks shell out to psql or mongo-export to generate CSV files for a list of services and upload them to an S3 bucket, then load those CSV files plus others found in that bucket (placed there by other services) into Redshift. If a Redshift copy fails due to data changes we sample the CSV and generate a working schema from its contents.
We also store application usage data provided by Segment Warehouses as well as data from vendors such as Salesforce and Sailthru.
For production data processing (such as recommendations), large-scale machine learning or even simpler parallel processing such as generating website sitemaps, we have our own Hadoop cluster configured and managed by Cloudera Manager and running on EC2. We leverage Apache Spark and Hadoop with some Ooozie workflow scheduling. The same data pipeline that writes data to S3 also pumps data to HDFS with either Ruby code or Sqoop and is read by Spark jobs written in Scala using Hive. Spark has improved performance and capacity tenfold over our older in-house systems and we will be moving all lengthy processing implemented in Ruby to this system gradually.
For general data access and dashboards we rely on Looker. This system empowers all non-engineers to access all of our data. At the time of writing, there are 50 users running 3,500 queries a day against Redshift via Looker. We've found it expedient to pre-compute common denormalized views, and to create our own session rollups from raw pageviews and events for the additional flexibility it gives us in understanding user behavior.
For more in-depth work, we use Jupyter Notebooks to connect to our Redshift cluster and by default import pandas, sci-kit learn, and pyplot for data analysis.
We completed our full migration from Solr to Elasticsearch in the last 18 months, and now use Elasticsearch across all front-ends. This ranges from our artwork filter interfaces through to our real-time artwork similarity features. Elasticsearch gives us high availability clustering features out of the box and easy horizontal scaling on demand.
As Artsy's business has grown more complex, so has the data and concepts handled by its core API. We've begun supporting certain product areas with separate, dedicated API services, and even extracting existing API domains into separate services when practical. These services tend to expose simple REST-ful HTTP APIs, maintain separate data sources, and even do their own authentication. This has certain advantages:
- Each system can be deployed and scaled independently.
- Each chooses the best-suited languages and technologies for its purpose.
- Code bases remain more focused and developers' cognitive overhead is minimized.
Balancing these out are some very real disadvantages:
- Development must sometimes touch multiple systems.
- Some data is copied between services. These can become out-of-sync, though we always try to have a single authoritative source in such cases.
- Deploys must be coordinated.
At our size and complexity, a single code base is simply impractical. So, we've tried to be consistent in the coding, deployment, monitoring, and logging practices of these services. The more repeatable and disciplined our process, the less overhead is introduced by additional systems.
We've also explored alternate communication patterns, so systems aren't as dependent on each other's APIs. Recently we've begun publishing a stream of data events from our core systems that other systems can consume. Other systems can simply subscribe to the notifications they care about, so the source system doesn't need to be concerned about integrating with one more destination. After experimenting with Kafka but finding it hard to manage, we switched to RabbitMQ for this purpose. To provide consistency when publishing events we have our own gem.
All our recent AWS infrastructure is configured in code using Terraform. This approach has allowed us to quickly replicate entire deployments along with their dependencies and has increased visibility into the state of our infrastructure across our teams. We started developing an open source Docker workflow toolkit named Hokusai in order to manage a containerized workflow, CI and deployment to Kubernetes. Our Kubernetes clusters are managed using Kops and similarly provisioned using Terraform. This new workflow is reducing our dependence on Heroku, giving us more flexibility in our deployments and a more efficient use of server resources.
Like any attempts at mapping something as large as the daily work for a thirty-ish person engineering team, the map is not the territory. However, the exploration is worth the time it takes to keep notes for reading again in the next two years.
If you're interested in helping us make this an even longer post in two more years, or more interestingly shorter - we nearly always have a position open for engineers.