A veteran of early Twitter's fail whale wars, Dmitriy joins the show to chat about the time when 70% of the Hadoop cluster got accidentally deleted, the financial reality of writing a book, and how to navigate acquisitions.
Segments:
(00:00:00) The Infamous Hadoop Outage
(00:02:36) War Stories from Twitter's Early Days
(00:04:47) The Fail Whale Era
(00:06:48) The Hadoop Cluster Shutdown
(00:12:20) “First Restore the Service Then Fix the Problem. Not the Other Way Around.”
(00:14:10) War Rooms and Organic Decision-Making
(00:16:16) The Importance of Communication in Incident Management
(00:19:07) That Time When the Data Center Caught Fire
(00:21:45) The "Best Email Ever" at Twitter
(00:25:34) The Importance of Failing
(00:27:17) Distributed Systems and Error Handling
(00:29:49) The Missing README
(00:33:13) Agile and Scrum
(00:38:44) The Financial Reality of Writing a Book
(00:43:23) Collaborative Writing Is Like Open-Source Coding
(00:44:41) Finding a Publisher and the Role of Editors
(00:50:33) Defining the Tone and Voice of the Book
(00:54:23) Acquisitions from an Engineer's Perspective
(00:56:00) Integrating Acquired Teams
(01:02:47) Technical Due Diligence
(01:04:31) The Reality of System Implementation
(01:06:11) Integration Challenges and Gotchas
Show Notes:
- Dmitriy Ryaboy on Twitter: https://x.com/squarecog
- The Missing README: https://www.amazon.com/Missing-README-Guide-Software-Engineer/dp/1718501838
- Chris Riccomini on how to write a technical book: https://cnr.sh/essays/how-to-write-a-technical-book
Stay in touch:
- Make Ronak's day by signing up for our newsletter to get our favorites parts of the convo straight to your inbox every week :D https://softwaremisadventures.com/
Music: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.en