A few days ago we have started seeing the Heroku deployments of one of our applications randomly hang during
bundle install. The problem worsened with time and we were not able to do a deployment for days.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Seeing bundler hang on "Fetching gem metadata from http://rubygems.org/", my immediate reaction was to blame the RubyGems Dependency API for its poor performance and attempt the recommended workaround of switching to http://bundler-api.herokuapp.com. That didn't work.
I also tried to reproduce the issue on a local environment, including a (what I thought was) a completely clean machine at no avail. My
bundle install would always succeed.
Finally, everything pointed at an infrastructure problem with Heroku itself, so I opened a ticket (#72648), tweeted endlessly to Heroku devs, pinged a contact at Heroku on Skype and generally annoyed people for 5 straight days. It was a frustrating problem and I was getting no useful help.
Fast forward, this turned out to be an issue in Bundler. Narrowing it down would have been relatively easy if I had known where to look.
I hope this post helps you with similar issues.
Heroku Slug Compiler
Heroku provides small Ubuntu virtual machines on-demand, called "dynos", that look very much like any other Linux box. You can
heroku run bash and examine the file system of a running dyno. You can delete the bundler cache, rerun
bundle install, etc. But deployment does not happen in a running dyno - every time you push to Heroku, deployment happens inside a compiler dyno. Heroku attaches the dyno to your slug filesystem (your code), which may include a cache from previous runs. It then executes the code inside heroku-buildpack-ruby, specifically this method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A lot of these functions invoke
IO.open and transmit
$stderr back to you. You see everything Heroku sees and while you cannot get access to the compiler dyno, there's really no mystery to this process. Heroku slug compiler will timeout after 15 minutes and produce a stack with
Utils::TimeoutError. And everything Heroku does should be reproducible locally.
The key to getting a repro of my issue locally was to use the Bundler Troubleshooting section. I had previously missed one of the steps in cleaning the local Bundler cache.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
This hung with my Gemfile the same way as on Heroku.
Bundler Dependency Resolver
So what is bundler doing?
Bundler runs the gem dependency resolver, which is described in detail in Pat Shaughnessy's blog post. The post suggests running
DEBUG_RESOLVER=1 bundle install, which produced a mountain of output that isn't very helpful.
I made a pull request with a similar setting called
DEBUG_RESOLVER_TREE, which reduces the output to the gems being resolved. This helped me narrow down a small repro. I was also able to make some sense of what bundler was doing: backtracking in an attempt to find a combination of gems matching the versions declared in
Gemfile for every combinations of
builder above version 3.2, only to fail to find a compatible version of
builder every single time. That's a lot of versions to try.
Adding an entry for
Gemfile fixed the issue.
If you remember anything from this post, next time you have a hang inside
bundle install on or off Heroku, start with Bundler Troubleshooting.