Testing is a critical part of our workflow at Artsy. It gives us confidence to make regular, aggressive enhancements. But anyone who has worked with a large, complex test suite has struggled with occasional failures that are difficult to reproduce or fix.
These failures might be due to slight timing differences or lack of proper isolation between tests. Integration tests are particularly thorny, since problems can originate not only in application code, but in the browser, testing tools (e.g., Selenium), database, network, or external APIs and dependencies.
The Quarantine
We’ve been automatically retrying failed tests, with some success. However, these problems tend to get worse. (If you have 10 tests that each have a 1% chance of failing, roughly 1 in 10 builds will fail. If you have 50, 4 in 10 builds will fail.)
Martin Fowler offers the most compelling thoughts on this topic in Eradicating Non-Determinism in Tests. (Read it, really.) He suggests quarantining problematic tests in a separate suite, so they don’t block the build pipeline.
Read on →