A former chaos engineer offers 5 tips for handling online disasters remotely


Kolton Andrus Contributor
I recently had a scheduled video conference call with a Fortune 100 company.
Everything on my end was ready to go; my presentation was prepared and well-practiced. I was set to talk to 30 business leaders who were ready to learn more about how they could become more resilient to major outages.
Unfortunately, their side hadn’t set up the proper permissions in Zoom to add new people to a trusted domain, so I wasn’t able to share my slides. We scrambled to find a workaround at the last minute while the assembled VPs and CTOs sat around waiting. I ended up emailing my presentation to their coordinator, calling in from my mobile and verbally indicating to the coordinator when the next slide needed to be brought up. Needless to say, it wasted a lot of time and wasn’t the most effective way to present.
At the end of the meeting, I said pointedly that if there was one thing they should walk away with, it’s that they had a vital need to run an online fire drill with their engineering team as soon as possible. Because if a team is used to working together in an office — with access to tools and proper permissions in place — it can be quite a shock to find out in the middle of a major outage that they can’t respond quickly and adequately. Issues like these can turn a brief outage into one that lasts for hours.
Quick context about me: I carried a pager for a decade at Amazon and Netflix, and what I can tell you is that when either of these services went down, a lot of people were unhappy. There were many nights where I had to spring out of bed at 2 a.m., rub the sleep from my eyes and work with my team to quickly identify the problem. I can also tell you that working remotely makes the entire process more complicated if teams are not accustomed to it.
There are many articles about best practices aimed at a general audience, but engineering teams have specific challenges as the ones responsible for keeping online services up and running. And while leading tech companies already have sophisticated IT teams and operations in place, what about financial institutions and hospitals and other industries where IT is a tool, but not a primary focus? It’s often the small things that can make all the difference when working remotely; things that seem obvious in the moment, but may have been overlooked.
So here are some tips for managing incidents remotely:
There were many nights where I had to spring out of bed at 2 a.m., rub the sleep from my eyes and work with my team to quickly identify the problem… working remotely makes the entire process more complicated if teams are not accustomed to it.
Kolton Andrus Contributor Kolton is co-founder and CEO of Gremlin, the chaos engineering company helping the world build a more reliable internet. I recently had a scheduled video conference call with a Fortune 100 company. Everything on my end was ready to go; my presentation was prepared and well-practiced. I…
Recent Posts
- Meta’s AI chatbot will soon have a standalone app
- CRKD teamed up with Gibson to make new guitar controllers
- Amazon CEO says ‘beautiful’ new Alexa hardware is coming this fall
- Apple will let parents share their kids’ ages to limit app access
- Perplexity’s voice mode gets a futuristic makeover on your iPhone
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010