Skip to content

Instantly share code, notes, and snippets.

@nikhilbchilwant
Last active February 2, 2026 21:48
Show Gist options
  • Select an option

  • Save nikhilbchilwant/226fe8a11fd957c30a12fb1ffa8b8503 to your computer and use it in GitHub Desktop.

Select an option

Save nikhilbchilwant/226fe8a11fd957c30a12fb1ffa8b8503 to your computer and use it in GitHub Desktop.
Stepwise Development Plan
Enterprise Agentic Pipeline for API Change Notes
Step 1 — Formalize scope & contracts (do this first)
Goal: Eliminate ambiguity before engineering starts.
Actions
Define what qualifies as an API change (public endpoints, DTOs, OpenAPI, versioned contracts).
Define the exact structure of the output file (sections, metadata, wording constraints).
Define change severity categories (breaking / non-breaking / informational).
Define confidence thresholds and publication rules.
Deliverables
Written API-change definition
Output contract (schema + example)
Acceptance metrics
Quality Gate
Stakeholders (API owners + business) agree that the output format is correct.
Step 2 — Normalize historical data into a golden dataset
Goal: Establish ground truth and evaluation baseline.
Actions
Convert last 50 human-written releases into structured records:
Release ID
API notes
Referenced JIRA IDs
Mentioned API elements (best-effort)
Classify changes by type and severity.
Extract canonical phrasing patterns.
Deliverables
Versioned “golden dataset”
Label definitions and mapping rules
Quality Gate
Domain experts confirm historical normalization reflects reality.
Step 3 — Define system boundaries & trust model
Goal: Decide where determinism ends and probabilistic reasoning begins.
Actions
Declare deterministic vs non-deterministic components.
Define what the agent is allowed to do.
Define fail-closed behavior (what happens on uncertainty).
Decisions
Agent only summarizes and classifies; it does not fetch data.
Pipeline controls execution order.
Deliverables
Architecture decision record (ADR)
Quality Gate
Architecture review sign-off.
Step 4 — Build deterministic ingestion & filtering pipeline
Goal: Create a reproducible, auditable foundation.
Actions
Scan sprint-bounded commits.
Extract JIRA IDs (assumed reliable).
Filter candidate commits via deterministic heuristics:
Paths, file types
Signature changes
Spec deltas
Produce structured change artifacts.
Deliverables
Commit → API-delta mapping
Deterministic pipeline output
Quality Gate
Same input always produces same output.
Step 5 — Integrate selective JIRA enrichment
Goal: Add intent and context without overfetching.
Actions
Fetch JIRA summary + description only for filtered commits.
Normalize and sanitize text.
Link JIRA intent to code deltas.
Deliverables
Enriched structured context bundles
Quality Gate
JIRA outages or failures do not break the pipeline.
Step 6 — Back-test detection against historical releases
Goal: Validate correctness before adding agent logic.
Actions
Replay historical releases through the pipeline.
Compare detected API changes vs historical notes.
Measure precision, recall, and false positives.
Deliverables
Evaluation report
Tuned heuristic rules
Quality Gate
Detection metrics meet agreed thresholds.
Step 7 — Introduce bounded agent for summarization
Goal: Generate human-readable API notes safely.
Actions
Feed agent only structured context bundles.
Ground prompts with historical examples.
Enforce strict output schema.
Require confidence scoring and evidence references.
Deliverables
Agent output conforming to schema
Quality Gate
Invalid or low-confidence outputs are rejected automatically.
Step 8 — Validate agent outputs using historical replay
Goal: Prove agent behaves like human authors.
Actions
Run agent on historical releases.
Compare summaries to golden dataset.
Measure semantic similarity, verbosity, tone alignment.
Deliverables
Agent evaluation metrics
Approved prompt versions
Quality Gate
Agent meets or exceeds human similarity thresholds.
Step 9 — Add human-in-the-loop workflow
Goal: Maintain trust while capturing feedback.
Actions
Route outputs to reviewers based on confidence.
Allow edit, approve, or reject.
Capture edits as labeled feedback.
Deliverables
Review workflow
Audit logs
Quality Gate
All published notes are traceable to approvals.
Step 10 — Implement governance & auditability
Goal: Make the system enterprise-compliant.
Actions
Store:
Commits, JIRA data, diffs
Agent inputs/outputs
Prompt and model versions
Implement access controls and retention rules.
Deliverables
Audit trail
Compliance documentation
Quality Gate
System passes internal audit review.
Step 11 — Gradual automation rollout
Goal: Reduce human effort safely.
Actions
Auto-publish low-risk, high-confidence changes.
Keep breaking changes always reviewed.
Monitor drift and error rates.
Deliverables
Automation policy
Monitoring dashboards
Quality Gate
Error rates remain below defined thresholds.
Step 12 — Continuous improvement
Goal: Sustain quality over time.
Actions
Periodic historical replay with new data.
Prompt and heuristic tuning.
Confidence threshold adjustments.
Deliverables
Versioned improvements
Updated evaluation reports
Quality Gate
No regression in quality metrics.
----------------------------------
1. Evidence-first artifacts (treat evidence as a product)
Suggestion:
Persist evidence bundles as first-class, versioned artifacts.
What this means
Every pipeline run produces an immutable “evidence package” containing:
Commit SHAs + diffs
Extracted API deltas
JIRA summary + description snapshot
Pipeline version + ruleset version
Why it matters
You can re-run the agent without touching Git or JIRA again
Auditors and reviewers can inspect facts independently of the AI
Enables deterministic replay and regression testing
Enterprise pattern
Separate fact collection from interpretation.
2. Two-pass classification before summarization
Suggestion:
Split reasoning into classification → summarization, even if both use the same model.
Pass 1: Classification
Is this an API change?
Change type?
Breaking vs non-breaking?
Confidence score
Pass 2: Summarization
Only runs if pass 1 succeeds
Uses classification outputs as constraints
Why it matters
Reduces hallucinations
Enables partial automation (e.g., auto-publish non-breaking changes)
Makes evaluation easier and more explainable
3. Confidence as a computed value, not a model guess
Suggestion:
Treat confidence as a composite score, not just an LLM output.
Combine
Heuristic confidence (deterministic)
Historical similarity score
Model confidence
JIRA intent clarity score (e.g., explicit “API change” mention)
Why it matters
Prevents over-trusting the model
Enables predictable automation policies
Easier to justify decisions to stakeholders
4. Negative capability testing (explicit “should NOT happen” cases)
Suggestion:
Create a test suite of known non-API changes.
Examples:
Internal refactors
Logging changes
Performance optimizations
Test-only commits
Why it matters
Enterprise failures often come from false positives, not false negatives
Business teams lose trust faster from noise than from missing items
Pattern
Measure false positives as aggressively as recall.
5. Human language alignment layer
Suggestion:
Introduce a language normalization step before final output.
What it does
Enforces:
Verb tense
Terminology (“consumer” vs “client”)
Severity words (“breaking”, “minor”)
Strips speculative phrasing (“might”, “appears”)
Why it matters
Business users care about consistency more than intelligence
Prevents stylistic drift across releases
6. Drift detection on process, not just model
Suggestion:
Monitor drift in inputs and behavior, not only outputs.
Track:
Average commits per release
Average API deltas per release
JIRA description length
Agent confidence distribution
Why it matters
Codebase evolution breaks heuristics silently
Organizational process changes (e.g., worse commit messages) degrade quality
Enterprise lesson
Most AI failures are upstream data failures.
7. “What changed since last run?” awareness
Suggestion:
Maintain a release memory.
What it enables
Detect repeated changes to the same API
Collapse noisy updates into a single coherent note
Prevent duplicate reporting across releases
Why it matters
Humans naturally reason across time; pipelines do not unless designed
8. Safe degradation paths
Suggestion:
Define explicit downgrade modes.
Examples:
JIRA unavailable → produce notes without intent but flag
Model unavailable → produce structured diff summary only
Heuristics uncertain → force human review
Why it matters
Enterprise systems must degrade gracefully, not fail hard
9. Separation of “release assembly” from “change detection”
Suggestion:
Treat “release notes assembly” as a distinct stage.
Why
One API change may span multiple commits
One commit may touch multiple APIs
Release notes need aggregation logic, not just detection
This avoids the trap of “one commit → one note”.
10. Kill switches & feature flags (non-optional)
Suggestion:
Everything agentic should be behind flags:
Auto-publish on/off
Agent on/off
Confidence thresholds
Why it matters
You will need to disable parts quickly
Builds trust with senior stakeholders
11. Identity pipeline + replay = enterprise superpower
Combine:
Idempotent pipeline
Evidence bundles
Historical replay
This gives you:
Deterministic debugging
Compliance confidence
Safe iteration on prompts and heuristics
Most “AI failures” happen because teams can’t replay the past.
12. Design principle to remember
Your agent is replaceable.
Your pipeline is the product.
If you design for:
Reproducibility
Evidence-first processing
Controlled autonomy
…you’ll end up with a system senior architects trust.
-------------------------------------
Problem Statement
Build an enterprise-grade autonomous / agentic workflow that generates API change notes for each release by analyzing Git commits and associated JIRA issues.
The output is a file containing 1–2 line summaries of API changes, intended for business and non-technical stakeholders, and distributed to the team.
Functional Requirements
Input Sources
Git repository
Commits belonging to a specific sprint or release window
Commits almost always contain a JIRA issue ID in the commit message
JIRA
JIRA tickets are fetched only for JIRA IDs extracted from commits
It is not possible or desirable to fetch all JIRA tickets
Processing Logic
Pull Git repository
Identify commits within the sprint/release
Filter commits that appear related to API changes
Extract code changes introduced by those commits
Fetch corresponding JIRA issues (using extracted JIRA IDs)
Connect code changes with JIRA descriptions
Summarize API changes into concise human-readable text
Aggregate summaries into a single output file
Send the file to the relevant team
API Change Scope
Focused on public API changes
API changes may include:
Endpoint changes
Request/response contract changes
Public DTO or interface changes
OpenAPI / specification changes
Non-API changes (e.g., refactoring, internal logic, tests) are out of scope unless they affect the public API
Output Requirements
A single file per release
Contains short (1–2 line) summaries of API changes
Written for business and API consumers
Includes traceability information (e.g., linked JIRA ID, commit reference)
Generated automatically but may support human review
Intended for distribution to stakeholders after generation
Historical Data
50 previous releases already exist
API change notes for these releases were written by humans
Historical data can be used as:
Reference behavior
Ground truth for evaluation
Validation baseline for automation
Architectural Constraints
Enterprise-grade quality required
Auditability and traceability are required
Idempotent pipeline behavior is required
Re-running the pipeline with the same inputs should produce the same outputs
Agentic behavior must be controlled and bounded
Deterministic processing preferred where possible
Technology Constraints
Primary implementation language: Java
Cloud environment available: GCP
Preference for cloud-agnostic design
No hard dependency on vendor-specific AI platforms
Integration with:
Git
JIRA (via API or MCP)
Workflow Characteristics
Commit messages are the primary entry point for identifying relevant JIRA issues
JIRA is used for contextual enrichment, not discovery
API change detection occurs before JIRA enrichment
Reasoning and summarization are part of the workflow
Output must be suitable for enterprise consumption and governance
Quality & Governance Constraints
Traceability from output → JIRA → commits → code changes
Ability to replay historical releases
Ability to evaluate system output against historical human-generated releases
Support for human-in-the-loop review
Clear separation between:
Deterministic pipeline stages
Probabilistic / agentic reasoning stages
Non-Goals (Explicit or Implicit)
No requirement to fetch or index all JIRA tickets
No requirement to allow agents to directly access Git or JIRA
No requirement for full autonomous publishing without governance
No requirement for implementation-level detail at this stage
----------------------------------
Step-by-Step PoC Plan (with Explicit Enterprise Path)
Step 1 — Lock problem definition & PoC boundaries
Purpose: Prevent scope creep and ensure results are interpretable.
Do
Write a one-page definition of:
What counts as an API change
Target audience (business/API consumers)
Output format (1–2 line summaries)
Choose:
One repository
1–2 recent releases or sprints
One API surface (e.g., REST controllers)
Skip (for PoC)
Multi-repo support
Multiple API styles
Carries to Enterprise
API-change definition
Output structure
Step 2 — Create a minimal golden dataset
Purpose: Establish objective comparison early.
Do
Select 5–10 historical releases from your 50
Extract:
Human-written API notes
Approximate commit ranges
Normalize notes into a simple structured format
Skip
Perfect commit ↔ note mapping
Full historical ingestion
Carries to Enterprise
Golden dataset format
Evaluation mindset
Step 3 — Implement minimal deterministic commit ingestion
Purpose: Ground everything in real signals.
Do
Pull commits for chosen release window
Extract JIRA IDs from commit messages (assumed reliable)
Store commit metadata and diffs
Skip
Caching
Idempotency guarantees
Retry logic
Carries to Enterprise
Commit ingestion logic
JIRA ID extraction rules
Step 4 — First-pass API-change heuristics
Purpose: Reduce noise before LLM involvement.
Do
Implement simple, explicit heuristics:
File paths (controllers, API packages)
Known annotations
OpenAPI spec file changes
Output a list of candidate API-change commits
Skip
AST diffs
Weighted scoring
Complex rule engines
Carries to Enterprise
Heuristic categories
Observed false positives/negatives
Step 5 — Lightweight JIRA enrichment
Purpose: Add intent and context cheaply.
Do
Fetch:
JIRA summary
JIRA description
Attach JIRA context to candidate commits
Skip
Caching
Rate-limit handling beyond basics
Carries to Enterprise
JIRA field selection
Linking strategy
Step 6 — Define structured input & output schemas
Purpose: Prevent PoC chaos and future rewrites.
Do
Define simple schemas:
Commit context
API delta
Agent output (summary, justification, confidence)
Validate outputs against schema
Skip
Versioning
Backward compatibility
Carries to Enterprise
Core data contracts
Step 7 — Single-pass agent summarization
Purpose: Validate LLM usefulness, not autonomy.
Do
One prompt
One agent
Provide:
Structured inputs
2–3 real historical examples
Generate:
1–2 line API summaries
Justification bullets
Skip
Multi-agent orchestration
Tool-calling
Confidence automation logic
Carries to Enterprise
Prompt patterns
Output structure
Step 8 — Manual evaluation against history
Purpose: Answer “Is this good enough?”
Do
Compare agent output to human notes:
Coverage
Accuracy
Tone alignment
Capture:
Missed changes
Hallucinations
Edits needed
Metrics (simple)
% of human changes detected
false positives
Reviewer usefulness rating
Carries to Enterprise
Evaluation criteria
Known failure modes
Step 9 — Iterate fast (tight PoC loop)
Purpose: Maximize learning, not code quality.
Do
Iterate on:
Heuristics
Prompt wording
Output phrasing
Re-run against same historical set
Skip
Refactoring for cleanliness
Performance tuning
Carries to Enterprise
Refined heuristics
Stable prompt templates
Step 10 — Produce PoC artifacts for decision-making
Purpose: Enable an informed “go / no-go”.
Deliver
Example generated API-change files
Side-by-side comparisons with human releases
List of failure modes
Quantified value estimate (time saved)
Carries to Enterprise
Business justification
Architectural confidence
Step 11 — Define explicit enterprise transition criteria
Purpose: Avoid PoC limbo.
Define
Minimum acceptable detection recall
Maximum tolerable false positives
Human approval rate threshold
Decision
Proceed to enterprise hardening only if criteria met
Carries to Enterprise
Quality gates
Step 12 — Transition to enterprise build (after PoC)
What changes
Add idempotent pipeline
Add audit logs
Add confidence-based automation
Add governance & security
Harden heuristics and diffing
What stays
Definitions
Schemas
Prompts
Evaluation framework
Historical dataset
Key principle (to keep yourself honest)
The PoC validates signal and behavior.
The enterprise build hardens what proved valuable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment