A mean bug made it into our production environment. It wasn't caught by our extensive test suite and caused thousands of emails to be sent to a handful of people. The root cause was an unfortunate combination of Devise, DelayedJob and, of course, our own code. It was an easy fix, but nobody ever wants this to happen again.
tl;dr DelayedJob says it's possible to set
Delayed::Worker.delay_jobs = false for your tests. Don't do it.
Consider the following
User model that implements various Devise strategies which support some kind of notification.
1 2 3 4 5 6 7 8 9 10 11 12 13
We are overriding a black box
notify! method and updating an attribute with a timestamp of the last notification.
Let's write a test.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
All green. But once this code hit production,
notify! was called in an infinite loop. How is that possible?
The call to
notify! is delayed using DelayedJob in production and is not delayed in test. It does not work under DelayedJob and will create as many delayed notifications as it possibly can until it runs out of stack space.
As a common pattern in Devise, the implementation of
notify! relies on an instance variable to signal that a notification has been sent. Setting the instance variable avoids sending the notification twice for multiple calls to
after_save callback invokes
update_attributes!, which causes another
notify! call unless
false. In a test, the call to
notify! will execute the notification (setting the instance variable), but will create a delayed job in production (without setting it).
We'll start by bringing our tests closer to a real production environment by leaving
Delayed::Worker.delay_jobs = true and making sure our problem is reproduced with a spec. We could call
Delayed::Worker.new.work_off for every test that needs to execute a delayed job, but that would be rather tedious. A better approach may be to register an observer that will execute a delayed job every time one is created. This is similar to a production environment where having enough delayed workers almost guarantees a job is picked up immediately after being scheduled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
The complete code, which handles a few more cases, including enabling and disabling the observer, and counting successful runs and errors can be found in this gist. Please help us improve it.
We can now test our notification without compromising on the delayed nature of the job and add a test making sure we create a single delayed job from a call to
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
This test will also run for a long time before failing with a stack overflow error. Our fix was not to call
notify! from an
We've suggested that immediate execution using an observer becomes a feature in DelayedJob in #423. Please add your comments.