Last year, I wrote about the process of fully automating our weekly engineering-wide standup. One of the benefits of automating what was a meeting run by a single person to a meeting run by everyone is that we removed a single point of failure. However, I may have fibbed just slightly when I called our standups fully automated.
This blog post is going to cover how (and more importantly, why) I finally automated the last 5% of our weekly standups. Let's go!
Our weekly standup process is a finely tuned machine. The meeting is run every Monday morning by a different pair of engineers, based on our on-call rotation. The process is documented in the open, and we improve it over time. I'm really proud of it! But there's just one problem... someone needs to make sure that the people responsible for the meeting know about that responsibility.
So for the past 8 months, I've begun every week by sending Slack DMs to the responsible engineers to remind them to run the standup, including a link to the docs. This made me a single point of failure: when I was out of the office, I always made sure to ask someone else to remind them about the meeting. What if I had forgot? Or I was sick that day? What would happen to our finely-tuned machine?!
Okay, so what would probably happen is that people would remember anyway or someone would post to Slack "hey who is running standup today?" Automating this reminder was a pretty small priority, but it was a gap in our process, and I wanted to patch it.
When I discussed all of this with my colleagues, it wasn't long before someone brought up the xkcd comic on automation. Oh, you know the one.
The comic observes that, often, the work necessary to automate a task often exceeds the amount of work necessary to just do the task manually. Pretty funny! You could be forgiven for taking the logical leap to say that automating tasks isn't worth it, generally, based on this observation. But that analysis would be incomplete because it focuses entirely on saving time. In my experience, automating a task often yields far more value than it costs in time.
Let's take the task of sending the on-call engineers their Monday morning standup reminder. How would we even automate that?
Well, first I think about how I do this task. First I look at the on-call schedule, shared in Google Calendar. Then I open a DM in Slack with the engineers. I copy the pre-composed message from my recurring OmniFocus task and send it in the DM.
Okay so how would I automate that? Artsy uses Peril already to automate reminders about open RFCs, so I piggy-backed on that existing automation. This is key: I'm not starting from scratch, I'm building upon the existing automation infrastructure that we've already built.
Next, I find out how to access the Google Calendar API using a Google Services Account. It has an authentication method purpose-built for server-to-server communication, which is perfect for our needs. I write some code to pick the correct calendar events based on the current time, extract the email addresses of those events' attendees, and handle an edge case. Then I look up the Slack API for Peril's platform, learn how to authenticate with it properly from a server, and lookup Slack user IDs based on those email addresses. Finally, compose the message and use some previously written code to post it to our #dev channel.
Boom. Open a PR. Add some unit tests. Done.
I spent about four hours automating this and by my calculations, I'll recoup that time by... July 2020. But like I said, there's more value to this than the time I saved.
In the process of automating this, I learned how to use two new APIs and I created infrastructure in our Peril installation to access them. Not only did I build upon the existing automation framework, but I contributed to it so it's easier for the next person. I even fixed a Peril bug in the process.
Automation encourages automation. Every time you automate a task, it gets easier to automate the next one. With sufficient infrastructure, a sort of exponential takeoff happens: all of a sudden you're not just automating existing tasks, you're using that infrastructure for new tasks. Tasks that add value to your team, like merge-on-green or notifying engineers of recent API changes.
As a consequence of the nature of engineering, we often consider ideas in only terms of constraints. We define what's possible by what we can already accomplish. Automation is a way to hack around that habit; it encourages engineers to think outside the box by giving us a larger box. Simple, but effective!
So. Four hours of work. Was it worth it?
Well, let's evaluate this in terms of impact. Those four hours could have kept our standups running until next July, or they could have automated that task and further enhanced our automation infrastructure. And, personally, it was very satisfying.
I would say that's definitely worth it.