Spec-Driven Development with SpecKit and Claude Code

The problem is not that we don’t write specs. It’s that specs are not the system of record. Code is. In practice, the spec lives in a ticket, gets closed, and drifts. With AI generating code faster than teams can review it, that gap becomes expensive fast.

Specification-Driven Development flips that model. The spec comes first and stays the source of truth. It is a precise, versioned definition of intent that drives architecture, implementation, tests, and docs. The idea has existed in fragments: TDD, BDD, API-first design all push toward defining intent earlier. What’s different here is making the spec the primary artifact throughout, not just at the start.

Tools like SpecKit, AgentOS, Tessl, SuperClaude, BMAD, and Kiro are all exploring this space, each from a different angle.

I’ve been experimenting with this for a while. See Vibe Coding to Spec-Driven Development and Practical Tips for Spec-Driven Development with Agents. The idea made sense but the structure was ad hoc. So I tried it end to end using SpecKit, building a Strava dashboard across four features: OAuth login, athlete profile, recent activities, and client-side filtering and search.

Getting Started

SpecKit installation is simple. I used the 0.4.1 release:

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@v0.4.1

Once installed, you create a new project with:

specify init strava-dashboard --ai claude

The --ai claude flag sets the tone for everything that follows. SpecKit generates its commands and templates tailored to how the model reasons. Different model, different scaffolding.

How SpecKit Works

The first step is /speckit.constitution. This generates a constitution at .specify/memory/constitution.md, which acts as the architectural DNA of the project. It defines non-negotiable rules: technologies, testing requirements, style guidelines, constraints. The agent checks against it continuously. If it reaches for a forbidden library or skips test coverage, it gets flagged. Getting Claude to actually honor it across sessions requires one more step, covered in the Teaching Claude Code the Methodology section below.

From there, every feature follows the same linked chain:

/speckit.specify    ->  spec.md      (what we're building)
/speckit.plan       ->  plan.md      (how we're building it)
/speckit.tasks      ->  tasks.md     (every task, phased and ordered)
/speckit.implement  ->               (code generation, task by task)

Each document feeds the next. The plan references the spec. The tasks reference both.

Each feature gets its own folder under specs/:

specs/
  001-strava-oauth-login/
  002-athlete-profile/
  003-recent-activities/
    spec.md
    plan.md
    tasks.md
    research.md
    data-model.md
    contracts/
    checklists/

This structure makes every feature self-contained and traceable. You can understand what was built, why, and how it was validated without reading the code.

Each feature is developed on its own branch and merged via a pull request, keeping history clean and tied back to the spec.

Human in the Loop

The SDD lifecycle has two points where a human must stop and verify before moving forward.

The first is between phases. SpecKit produces a spec.md, then a plan.md, then a tasks.md. Each builds on the previous. If something is wrong in the spec, it propagates forward. Catching a misaligned assumption in the spec takes a minute; catching it after implementation is a much bigger problem. The habit that made a real difference was reading each document after it was generated, not skimming it, and correcting it before moving to the next phase.

The second is before shipping. Each feature has a checklists/manual-testing.md with concrete verification steps that must be signed off with real data on a running instance. This is where bugs that unit tests cannot catch actually get caught. During development of this project, a redirect loop was only discovered here: the app was silently burning through the Strava API rate limit every time an auth token expired, making hundreds of API calls per minute. No unit test would have found that. The manual checklist did.

There is a tendency, especially when working with AI code generation, to treat passing tests as sufficient. It is not. Both checkpoints exist for the same reason: AI output is fast, but not infallible.

Teaching Claude Code the Methodology

SpecKit and Claude Code do not connect automatically. That is the part that does not come for free.

SpecKit generates its artifacts (constitution.md, spec.md, plan.md, tasks.md) under .specify/ and specs/. Claude Code has no awareness of this structure out of the box. It does not know what "constitution" means in this project, where specs live, or that TDD is non-negotiable. The first time it was asked to check the constitution before opening a PR, it had no idea where to look.

The bridge is CLAUDE.md. Claude Code reads this file at the start of every session, including after a /clear. It is the right place to encode SpecKit's vocabulary: what the constitution is, where specs live, what the slash commands do, and what the definition of done requires. Once that was in place, Claude behaved consistently across sessions, not because it remembered, but because the project told it what to know.

This also exposes a gap in SpecKit: specify init --ai claude does not generate CLAUDE.md. If --ai claude is a first-class flag, generating CLAUDE.md should be a first-class output. Until that is fixed, you have to create it manually; it takes five minutes and the return is immediate. The bug has been filed: specify init --ai claude does not generate CLAUDE.md.

Gaps

SpecKit gets a lot right at the feature level. But after using it end to end across four features, some structural gaps became clear, not blockers, but places where the workflow runs out of guardrails.

Project Governance: SpecKit is excellent at the feature level but has no project-level governance layer. There's no document that shows the relationship between all features, their shared dependencies, their collective Definition of Done, or the overall system contract. The constitution.md comes closest but it's more a style guide than a system architecture document.
Spec Versioning: Once a spec is written and implementation begins, what happens when requirements change mid-feature? SpecKit has no formal amendment process. There's no diff between spec.md v1 and spec.md v2, no record of why something changed, and no instruction for Claude on how to handle a spec that gets updated while a feature is in flight.
Cross-Feature Dependencies: Each spec lives in isolation under specs//. But, in reality, features depend on each other. There's no mechanism for expressing "this feature requires feature 003 to be complete" or for Claude to detect when a change in one feature's contracts breaks another's assumptions.
Post-Implementation Spec Sync: In order to keep the spec updated with the implementation, we need to close the loop with spec → implementation → runtime → updated spec. Right now SpecKit covers the first half well.

What Actually Changes

The honest answer is that SDD with SpecKit does not make development faster in the short term. The upfront work is real.

What it changes is predictability. Scope is defined before the first line of code. Edge cases are in the spec before they become bugs. Done has a definition. And when Claude Code is generating implementation, it is working from precise, structured documents rather than trying to infer intent from a vague prompt.

It turns out the discipline of writing clearly before coding is also the discipline that makes AI code generation more reliable. The two things reinforce each other in a way that feels obvious in retrospect but takes setting up deliberately to actually realise.

The full source for this project is on GitHub. If you want to try the workflow yourself, SpecKit is at github/spec-kit.

arun-gupta/spec-driven-speckit-claude.md

Select an option

No results found