Harness Engineering: How to Build Software When Humans Steer, Agents Execute

In this AI Engineer talk, Ryan Lopopolo argues that modern coding agents change the scarce resource in software engineering. The bottleneck is no longer typing implementation code, but designing the systems, context, guardrails, review loops, and operating practices that let agents reliably do the full job. His core claim is that engineering teams should reorganise around delegation, systems thinking, context management, and automation of acceptance criteria, rather than treating agents as autocomplete or code-writing assistants.

Code Is Abundant; Attention Is Scarce

"Code is free."

Lopopolo frames coding agents as a major shift in engineering economics: implementation is becoming abundant, while human time, human/model attention, and model context window remain scarce. Engineers now have access to the equivalent of many parallel implementers, limited mainly by GPU capacity, token budgets, and the ability to direct them productively.

Key implications:

Implementation is no longer the central constraint.
- Code can be produced, refactored, deleted, and migrated more cheaply.
- Work that previously stayed in the backlog as low-priority "P3" items can now be started in parallel.
- Internal tools can afford better quality from day one, such as localisation and internationalisation.
The engineer's role shifts upward.
- More emphasis on system design, delegation, prioritisation, and acceptance criteria.
- Less time should be spent manually typing code or repeatedly reviewing the same classes of mistakes.
- More time should be spent identifying where humans are blocking throughput.
Every engineer behaves more like a staff engineer.

"Every one of you is a staff engineer."

This means engineers must think in terms of leverage: what structures, documentation, reusable tools, and feedback systems will allow many agent trajectories to succeed without constant human intervention?

Harnesses, Context, And Guardrails Become The Engineering Surface

The talk's central concept is harness engineering: building the surrounding system that enables agents to perform reliable software work. A harness is not just a prompt; it is the total environment of skills, repository structure, docs, tests, lints, review agents, CI checks, and tools that inject the right instructions at the right time.

"All the harness should do is surface instructions to the model at the right time."

The practical harness elements Lopopolo highlights include:

Skills and repository instructions
- Keep a small set of high-leverage skills rather than thousands of brittle ones.
- Use skills to teach agents how to launch the app, inspect logs, run local observability, use Chrome DevTools, and complete tasks to acceptance.
Durable documentation
- Capture what "good" means in the repo: ADRs, persona-specific guidance, QA expectations, review norms, and historical lessons.
- Turn expert judgement into reusable context so every agent benefits from the best practices of the team.
Custom lints and source-code tests
- Use static checks to encode recurring requirements, such as retries and timeouts around network calls.
- Write tests about the shape of the code, not only runtime behaviour.
- Example: limiting file length to improve context efficiency, or enforcing canonical utilities instead of repeated local implementations.
High-quality error messages
- A lint failure should not merely say something failed.
- It should tell the agent how to fix the issue and why the team's architecture expects that pattern.

The deeper point is that models can make many plausible choices, so teams must specify their non-functional requirements: maintainability, reliability, security, testability, API boundaries, and style conventions.

Making Agent Work Legible And Reviewable

Lopopolo argues that good agent output depends on making the codebase and process legible to agents. Since code in the file system is also text, repository structure becomes a form of prompting. A consistent codebase gives agents transferable context, making the next correct token easier to predict.

Important practices:

Make things the same.
- One way to write CI scripts.
- One way to add lint rules.
- One bounded-concurrency helper.
- One canonical schema or utility for repeated patterns.
- One clear package structure and ownership model.
Structure repos for locality.
- Organise code so most changes are confined to a subtree.
- Use packages or domain boundaries to make public/private APIs clear.
- Reduce merge conflicts by reducing overlapping work areas.
Treat PRs as collaboration hubs.
- Humans and agents can both review, comment, and propose changes.
- The implementation agent can accept, defer, or reject feedback.
- The goal is not perfect compliance with every comment, but moving acceptable work through the system.
Avoid making humans the synchronous blocker.
- If every PR needs deep human review, velocity collapses.
- Instead, encode repeated review feedback into documentation, lints, tests, or reviewer agents.

A notable team practice is "garbage collection day": engineers spend dedicated time identifying recurring "slop" from the week and building systematic ways to prevent it. The loop is:

Notice repeated human review comments.
Identify the underlying missing context or constraint.
Encode that lesson in docs, tests, lints, or reviewer agents.
Let future agents self-correct before humans are pulled in.

Getting Started With Coding Agents

For teams moving from manual coding to agent-led development, Lopopolo recommends starting where agents improve confidence and reduce repeated human work.

Good first uses:

Add or improve tests
- Agents are strong at reading existing code and writing tests that capture intended behaviour.
- Better tests also make later agent work safer, because agents receive faster feedback.
Automate time sinks
- Look for where engineers wait: slow CI, flaky tests, manual review, repeated local setup, brittle deployments.
- Use agents to remove these bottlenecks incrementally.
Move review knowledge into the repo
- Ask team members to document what they look for as front-end architects, reliability engineers, scalability reviewers, or product-minded engineers.
- Spin up reviewer agents around those personas.
- Have those agents surface only meaningful blockers, such as P2-level issues or above.
Use plans carefully
- If a plan is going to drive a large rollout, it should be reviewed seriously.
- Approving unread plans can encode bad instructions into the work.
- For important work, Lopopolo suggests reviewing plan-only PRs before execution.

He also notes that token usage is not only for implementation. In his workflow, token spend is spread across planning and ticket curation, documentation, implementation, and CI/review processes. The value comes from getting code accepted and advancing the product, not merely producing code.

The Future: Agents Across The Whole Engineering Lifecycle

Lopopolo's long-term vision is not just agents writing patches. It is a system where humans provide priorities, success metrics, reliability metrics, and token budgets, and machines continuously advance the product with minimal hands-on steering.

The scope expands beyond coding:

QA and smoke testing
- Agents should download built artefacts, launch them, and validate critical user journeys.
- If they lack tools to do that, teams should help agents build those tools.
Operations and feedback loops
- Agents can triage user feedback, production pages, logging issues, and runbook gaps.
- They can help detect PII leaks, improve user operations workflows, and convert recurring support issues into product fixes.
Context as durable infrastructure
- Context will not disappear as models improve.
- Requirements, guardrails, acceptance criteria, and team-specific standards still need to be surfaced.
- Better models make the harness more valuable because they can use that context more effectively.

The provocative mental model is that code may become a disposable build artefact of a richer specification: documentation, constraints, tests, review expectations, and product goals. In that world, switching models resembles swapping a compiler backend, while the surrounding constraints define what acceptable output means.

TL;DR

Ryan Lopopolo argues that software engineering is shifting from manual implementation to harness engineering: building the context, guardrails, repository structure, tests, lints, and review agents that let coding agents do complete work reliably. The scarce resource is no longer code, but human attention and well-specified judgement.

Actionable Insights

Write down what "good" means in your codebase: QA plans, architecture rules, reliability expectations, and review criteria.
Convert repeated code review feedback into docs, lints, tests, or reviewer agents.
Start by using agents to add tests and automate the parts of development where humans wait.
Keep skills few but high-leverage, and make them the entry point for local dev workflows.
Structure repos so agents can work locally, follow consistent patterns, and avoid unnecessary merge conflicts.

intellectronica/harness-engineering-summary.md

Select an option

No results found

Select an option

No results found