Post as thread on @avivl account.
I taught my AI assistant to do SRE.
Not just "summarize logs" β actual incident response, alert analysis, and engineering metrics.
Here's what an AI-powered operations toolkit looks like π§΅
1/11
Skill #1: Incident Response
My Clawdbot can now: β’ Check production health across 67 Cloud Run services β’ Correlate alerts with recent deploys β’ Suggest rollbacks or scaling fixes β’ Follow actual runbooks, not hallucinate them
2/11
The key: structured diagnostics.
It knows to check:
- Recent deployments (GitHub Actions)
- Error logs (Cloud Logging)
- Service metrics (latency, memory)
- External dependencies (API quotas)
In that order. Like an actual SRE would.
3/11
Skill #2: Alert Insights
Weekly analysis of production alerts: β’ Scans Gmail for alert patterns β’ Identifies noisy/flapping alerts β’ Cross-references with monitoring config β’ Recommends specific threshold changes
Turns alert fatigue into actionable PRs.
4/11
The magic: it reads our actual infra code.
Points to specific files: "Adjust error threshold in src/core/services/monitoring/error-reporting.ts line 47"
Not generic advice. Real code changes.
5/11
Skill #3: DORA Metrics
Tracks the 4 key DevOps metrics: β’ Deployment Frequency β’ Lead Time for Changes β’ Change Failure Rate β’ MTTR
Weekly reports with trends and per-service breakdowns.
6/11
Data sources it pulls from:
β’ GitHub Actions β deploy frequency, failure rate β’ GitHub PRs β lead time (created β merged) β’ Gmail alerts β MTTR (alert β resolved)
All automated. No manual spreadsheets.
7/11
The pattern: Skills = Runbooks as Code
Each skill is: β’ A markdown file with procedures β’ CLI commands it can run β’ Context about our specific infra
AI follows the runbook. Humans review the output.
8/11
What changed for us:
Before: Wake up to 47 alerts, spend 30min triaging After: "Golem, what happened overnight?" β 2min summary
Before: Monthly DORA review (if we remembered) After: Weekly automated report in my inbox
9/11
The meta insight:
SRE is mostly pattern matching + executing known procedures.
That's exactly what AI is good at.
Humans should design the runbooks and make judgment calls. AI should execute the checklist.
10/11
All of this runs on a $34/month GCP VM.
Skills are just markdown files. No fancy infra needed.
Your AI assistant can be your junior SRE β if you teach it how.
#SRE #DevOps #AI #Clawdbot #PlatformEngineering
11/11