Real-World SRE Perspectives


Episode Artwork
1.0x
0% played 00:00 00:00
Mar 28 2019 35 mins   2
SHOW: 391 DESCRIPTION: Brian talks with Gustavo Franco (@stratus, Customer Reliability Engineer at Google) about real-world experience as SRE/SRE Manager and CRE Manager, a discussion about how to measure SRE success, as well as how to onboard the SRE/CRE concepts and processes to new teams. SHOW SPONSOR LINKS: MongoDB Atlas - Automated cloud MongoDB service Visit mongodb.com/cloudcast to learn more. MongoDB Atlas handles all the costly database operations and admin tasks that you’d rather not spend time on, like security, high availability, data recovery, monitoring, and elastic scaling. Try MongoDB Atlas today! Datadog Homepage - Modern Monitoring and Analytics Try Datadog yourself by starting a free, 14-day trial today. Listeners of this podcast will also receive a free Datadog T-shirt Get 20% off VelocityConf passes using discount code CLOUD CLOUD NEWS OF THE WEEK: The Continuous Delivery Foundation was announced by the Linux Foundation Kubernetes v1.14 released - Adds Windows Container support Google introduces Cloud-based (streaming) Gaming Service called Stadia UPS To Send Nurses For In-Home Vaccinations SHOW INTERVIEW LINKS: Gustavo's Background: https://conferences.oreilly.com/velocity/vl-ca/public/schedule/speaker/150125 “Scaling SRE, the Journey from 1 to Many Teams” (Gustavo’s talk at Velocity) DevOps and SRE Tuning up SLIs SHOW NOTES: Topic 1 - Welcome to the show. Tell us about your background, and some of the things you work on today as it relates to SRE and CRE teams. Topic 2 - Let's talk about what SRE is intended to do, and maybe how it differs (or is the same) from existing teams that might be labeled "Ops" or "DevOps". Maybe we can also talk about some of the types of skills that highlight what SRE does. Topic 3 - What are some of the ways to avoid an SRE (or CRE) team just becoming the band-aid team to fix all the things that developers don't want to put into code because they are under deadlines (security, bug fixed, scalability, etc.)? Topic 4 - We're hearing more about these terms "AIOps" and "ChaosEngineering". How much can SRE/CRE teams augment applications through tools that either bring deeper insight (e.g. AIOps) or create scenarios that developers can't emulate (e.g. Chaos)? Topic 5 - You've been around SRE/CRE for a while now. What are some of the positive and negative lessons you've learned and could share with the audience? FEEDBACK? Email: show at thecloudcast dot net Twitter: @thecloudcastnet and @ServerlessCast&a