This week I'm joined by Karanveer Anand, SRE Technical Program Manager at Google to discuss blameless post-mortems. We cover:
🦅 The recent Crowdstrike outage and their public post-mortem
🚑 When do we do a blameless post-mortem?
😕 How do we do a blameless post-mortem?
✅ How do we make sure action items are followed through?
📰 The power of learning from post-mortems created by other teams and orgs
...and much more.
You can find Karanveer on LinkedIn: https://www.linkedin.com/in/karanveer/
You can find Crowdstrike's preliminary post incident report here: https://www.crowdstrike.com/blog/falcon-content-update-preliminary-post-incident-report/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sre
This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.