Keep Alert Chaos in Check


Episode Artwork
1.0x
0% played 00:00 00:00
Jan 26 2025 41 mins   3

Today we talk with Matvey Kukuy and Tal Borenstein, co-founders of Keep, a startup focused on helping companies manage and make sense of their alert systems. The discussion comes three years after Matvey's previous appearance - https://shipit.show/36 - where he talked about Grafana Labs' acquisition of his previous startup Amixr (now Grafana OnCall).

Keep tackles a significant challenge in modern tech infrastructure: managing the overwhelming volume of alerts that companies receive from their various monitoring systems. Some enterprises deal with up to 70,000 alerts daily, making it crucial to identify which ones represent actual incidents requiring attention.

We explore real-world examples of major incidents, including the significant CrowdStrike outage in July 2024 that caused widespread system crashes and resulted in an estimated $10 billion in worldwide damages. This incident highlighted how critical it is to quickly identify and respond to serious issues among numerous alerts. Matvey tells us about his most black swan experience.

The episode concludes with a hint that some of Keep's AI features may eventually be released as open source once they're sufficiently polished.

LINKS

EPISODE CHAPTERS


  • (00:00) - What is new after three years?

  • (02:58) - Take us through the last memorable incident

  • (07:16) - My most black swan

  • (08:50) - How would have Keep made the CrowdStrike experience different?

  • (12:38) - How do companies end up in that place?

  • (15:29) - Keep name origin

  • (17:40) - Why would someone pick Keep?

  • (23:22) - Let's think about our use case

  • (25:03) - Demo ends

  • (28:21) - Reporting capabilities?

  • (30:25) - Deploying & running Keep

  • (33:12) - 2025 for Keep

  • (38:50) - Until next time