Alex from Ramp Labs describes an experiment in making Ramp Sheets more self-maintaining by connecting background coding agents to production observability. The core shift is that AI has reduced the effort of writing code, so the bottleneck moves to maintenance, monitoring, triage, and directing agents at the right problems. Ramp's answer is not merely "an agent that codes", but a system where agents inspect live behaviour, create monitors, respond to alerts, tune noisy signals, and generate pull requests for real issues.
The talk begins with Ramp Labs' broader AI work, including Ramp Sheets, Latent Briefing, and steering vectors, but quickly centres on a practical problem: once AI can handle much of the coding, engineers spend more time deciding what needs attention.
"code maintenance"
Alex frames this as a change in the software development lifecycle:
- Engineers are increasingly watching dashboards, logs, metrics, and alerts.
- The key task becomes finding the right signal: what should the coding agent work on next?
- The ambition is to automate more than implementation:
- identify bugs,
- reproduce them,
- fix them,
- open pull requests,
- and keep monitoring aligned with the codebase.
The foundation is Ramp Inspect, an internal background coding agent. Its key advantage is that it runs in a sandbox where the repository is actually built and exercised. That means issues can be found through real execution, not only static reading.
Important properties of Inspect:
- Runs live code in a sandbox, enabling behavioural testing.
- Scales horizontally, so many sessions can run in parallel.
- Comes pre-connected to internal systems such as GitHub, Datadog, Sentry, Notion, and Google.
- Acts as a reusable unit of engineering work that can be wired into larger workflows.
The first implementation was a simple nightly cron job. Each night, an Inspect session received a standing prompt to search for problems and produce fixes.
Its recurring duties included:
- scan for security issues;
- sanity-test core functionality;
- stress-test recently merged pull requests;
- search for latent bugs that had not yet surfaced;
- open pull requests for reproduced bugs.
"real tests, real API requests"
The sandbox made this much more useful than a purely speculative agent. If Inspect found an error, it had usually reproduced it against running code. If it proposed a fix, the fix had to pass the original reproduction case. This reduced noise and improved confidence.
But the nightly approach had two major weaknesses:
- Statelessness: each run started fresh, so the agent tended to check the same paths repeatedly.
- Diminishing returns: after recently merged PRs were covered, the automation kept rediscovering familiar terrain.
- Observability overload: production systems emit huge volumes of logs, spans, metrics, and traces.
- Subtle bugs were hard to detect: not every production issue appears as a clear exception or red alert.
- Some issues only appear through correlations, such as:
- unusually high P90/P99 latency in one function;
- two metrics moving together in a suspicious way;
- behaviour that is technically successful but operationally wrong.
The lesson was that a useful maintenance agent needs state and focus. Alex did not want to build a general memory system, so he looked for an existing abstraction that already had persistence and scope.
The key abstraction was Datadog monitors. A monitor has a persistent description, is hosted externally, and is tied to a specific behaviour or failure mode. That made it useful as both a memory object and a focusing device.
The new workflow looked like this:
- A pull request merges into a Ramp Sheets code path.
- Automation calls Inspect.
- Inspect creates monitors for the new or changed behaviour.
- If a monitor fires, Inspect investigates.
- If the issue is real, Inspect proposes a fix.
- The fix is surfaced as a pull request.
This produces a dynamic observability system:
- monitoring evolves alongside the code;
- changed code receives fresh attention automatically;
- agents do not need to rediscover the whole system each day;
- every alert has enough context to guide investigation;
- the cost of maintaining many specific monitors becomes manageable because response is AI-assisted.
Alex gives an example involving a company whose employees had been blocked from benchmarking Ramp Sheets. The original block was incomplete, and weeks later they were still able to log in. The monitor caught this, alerted Alex, and produced a fix that could be merged.
The deeper claim is that AI changes the economics of observability. Historically, teams apply deep instrumentation only to a few critical areas, leaving other parts of the codebase relatively unwatched. With agent-assisted monitoring, Ramp can apply more granular monitoring across much more of the system.
The strongest objection to the system was alert quality. Alex says this objection was justified: when first turned on, the system produced a lot of low-value agent output.
The main sources of noise were:
- uncalibrated monitors that fired when nothing was actually broken;
- duplicate notifications when the same known issue repeatedly triggered alerts;
- hot paths where a single broken behaviour could generate many redundant messages;
- alerts that reached humans before the system had enough confidence.
"signal to noise ratio"
Ramp addressed this with a triage pattern. Instead of treating every monitor fire as a human-worthy incident, Inspect first evaluates whether the alert has merit.
For noisy monitors, Inspect can:
- inspect historical data;
- assess the scope and recurrence of the signal;
- decide whether the alert is real or noise;
- tune the monitor;
- or delete it if it is not useful.
For duplicate alerts, Inspect updates the monitor description after creating a fix. If the same issue fires again, the next Inspect session sees that a pull request already exists and stays quiet.
This is central to the system's viability. Alex emphasises that teams already ignore noisy monitors; adding AI does not make noisy alerts more valuable. The system must only involve humans when there is high confidence that the issue is real and worth attention.
The final section widens the frame. Alex argues that the most powerful gains may come from wiring agents together, not only improving individual agents.
"wiring is a lot of where the alpha is"
Inspect can be treated as a unit of work callable through an API. The next productivity unlock is chaining those units together into larger systems: monitors call agents, agents produce fixes, fixes update monitors, and context moves between sessions.
This points toward the idea of a software factory:
- a system that does not merely write code;
- a pipeline that ships, monitors, repairs, and improves software;
- a process that can scale beyond the throughput of an individual engineer or team.
Ramp is also exploring self-improvement, not just self-maintenance. That means using product signals to identify what users struggle with and what they want next.
Potential input sources include:
- logs;
- feedback forms;
- session replays;
- Slack;
- Gong;
- LogRocket;
- support and usage data.
The goal is to move from "keep the app reliable" to "make the app better" through agentic systems that can observe user pain, infer product opportunities, and implement improvements.
Ramp's experiment shows that self-maintaining software is less about a single coding agent and more about connecting agents to observability, state, triage, and pull-request workflows. The crucial constraint is alert quality: the system only works if humans see high-confidence, actionable issues rather than noise.
- Treat background coding agents as workflow primitives, not standalone assistants.
- Use production observability to direct agents at real, reproduced problems.
- Add persistent state through existing systems such as monitor descriptions before building custom memory.
- Require agents to reproduce bugs before proposing fixes.
- Build triage into the alert path so noisy monitors are tuned, deleted, or suppressed.
- Track duplicate fixes by writing PR context back into the monitor or alert source.
- Focus first on signal quality, because noisy AI maintenance systems will be ignored just like noisy human dashboards.