- 20 Years of SRE: Highs and Lows
- Scam or Savings? A Cloud vs. On-Prem Economic Slapfight
- Is It Already Time To Version Observability? (Signs Point To Yes.)
- Capacity Constraints Unveiled: Navigating Cloud Scaling Realities
- Sharding: Growing Systems from Node-scale to Planet-scale
- Product Reliability for Google Maps
- Build vs. Buy in the Midst of Armageddon
- Compliance & Regulatory Standards Are NOT Incompatible with Modern Development..
- The Ticking Time Bomb of Observability Expectations
- Synthesizing Sanity with, and in Spite of, Synthetic Monitoring
- Migrating a Large Scale Search Dataset in Production in a Highly Available...
- OIDC and CICD: Why Your CI Pipeline Is Your Greatest Security Threat
- When Your Open Source Turns to the Dark Side
- The Sins of High Cardinality
- Optimizing Resilience and Availability by Migrating from JupyterHub to the...
- 99.99% of Your Traces Are (Probably) Trash
- Meeting the Challenge of Burnout
- What We Want Is 90% the Same: Using Your Relationship with Security for Fun..
- Thawing the Great Code Slush
- Resilience in Action
- Navigating the Kubernetes Odyssey: Lessons from Early Adoption and Sustained...
- "Logs Told Us It Was Kernel – It Wasn't"
- What Is Incident Severity, but a Lie Agreed Upon?
- Hard Choices, Tight Timelines: A Closer Look at Skip-level Tradeoff Decisions...
- Triage with Mental Models
- Defence at the Boundary of Acceptable Performance
- Lightning Talks
- System Performance and Queuing Theory - Concepts and Application
- It Is OK to Be Metastable
- The Art of SRE: Building People Networks to Amplify Impact
- Teaching SRE
- Cross-System Interaction Failures: Don't Fail through the Cracks
- Gray Failure: The Achilles’ Heel of Cloud-Scale Systems
- The Invisible Door: Reliability Gaps in the Front End
- Automating Disaster Recovery: The Ultimate Reliability Challenge
- From Chaos to Clarity: Deciphering Cache Inconsistencies in a Distributed...
- Patching Your Way to Compliance with a Small Team and a Pile of Technical Debt
- Strengthening Apache Pinot's Query Processing Engine with Adaptive Server...
- Taming the Linux Distribution Sprawl: A Journey to Standardization and...
- Frontend Design in SRE
- Measuring Reliability Culture to Optimize Tradeoffs: Perspectives from an...
- Storytelling as an Incident Management Skill
- Real Talk: What We Think We Know — That Just Ain’t So
- What Can You See from Here?
- Dude, You Forgot the Feedback: How Your Open Loop Control...
- You Depend on Time, This Is How It Works and You Won’t...
- SRE Saga: The Song of Heroes and Villains
- The Frontiers of Reliability Engineering
- I Can OIDC You Clearly Now: How We Made Static Credentials a...
- OMG WTF SSO: A Beginner’s Guide to Single Sign-On...
- Sailing the Database Seas: Applying SRE Principles at Scale
- Survivor: MySQL Island – Outwit, Outplay, Outlast Metadata...
- Fixing Your Noisy Pager in 500 Easy Steps
- Achieving Excellence: SLO Thresholds That Transform Service...
- Selective Reliability Engineering: There Is No Single Source...
- Why You’re (Probably) Doing Service Catalogs Wrong
- Exploring the Unintended Consequences of Automation in Software
- Rock around the Clock (Synchronization): Improve Performance...
- Mnemonic Rules for Eponymous Laws or: There’s a Law for That!
- SRE Stakeholders: A Spotter’s Guide
- Panel Discussion: Is Reliability a Luxury Good?
- Enhancing Elasticsearch Performance: Innovative Reindexing...
- Lessons from Unix History
- Treat Your Code as a Crime Scene
- Finding the Capacity to Grieve Once More
- Incident Groundhog Day
- Anomaly Detection in Time Series from Scratch Using...
- Generative AI: Beyond (Just) Hype
- From PIDs to Pods: The Life Cycle of an eBPF-Autoinstrumented..
- Scheduling at Scale: eBPF Schedulers with Sched_ext
- When Your SaaS Provider Goes out of Business – Lessons from...
- Configuration Languages Are the Bane of Our Existence
- Just Buy the Printer: Resilience in Action
- Noisy Neighbors, through Networking
- Taming Noisy Benchmark Results Using Change Point Detection
- Enabling Product Scalability through Load Testing
- NVMe/TCP Makes iSCSI Look like Fortran
- The Silent Performance Killers: BIOS and Firmware Updates
- How a Single API Endpoint Saved Us 3000 CPU
- Managing the Risk of Software Supply Chain Attacks
- When SRE and Security Teams Meet to Face a Crisis
- How to Host a (Very) Popular Website for 30 Altairian...
- How Snowflake Migrated All Alerts and Dashboards to a...
- What If We Ask Linux to Do Cryptography for Us?
- Synthetic Monitoring and E2E Testing: 2 Sides of the Same Coin
- Lightning Talks
- Monitoring Systems as a Service – Walking the Line between...
- An Exploration in Storing Telemetry in Cloud Object Storage
- Opening the Box: Diagnosing Operating-System Task-Scheduler...
- Embrace Fleet Reboots and Make Them Boring
- A Brief History of Release Engineering
- Red Tide Revert
- Riot Games: Evolution of Observability at the Gaming Company
- A Powerful Logs Management Solution We All Have and Use but...
- Blast Radius Reduction for Large-Scale Distributed Systems
- AppStack: An Open Source Cloud Native Platform for Running...
- Science Reliability Engineering for High Performance Computing
- Get Your Non-SREs Oncall Ready!
- Transforming Production Readiness
- Energy Consumption of Datacenters
- Are We Really Engineers?