Reviewer: Claude (subagent)
Date: 2026-02-23
Document: ~/brain/finn/ai-native-engineering-roadmap.md
The research doc is well-sourced and comprehensive as a landscape survey — it successfully synthesizes external references (StrongDM, Anthropic internal data, YC startups, community patterns) into a coherent narrative. However, it has a significant gap between "here's what the industry is doing" and "here's exactly what FINN should do on Monday morning." The roadmap reads more like a research paper with a recommended direction than an implementation plan.
Biggest blind spot: The doc doesn't account for FINN's actual current state. After inventorying the EWA-Services GitHub org, I found FINN is significantly more advanced than the doc assumes:
- They already have centralized CI/CD via EWA-Actions (synced to 70+ repos)
- They already have Claude-powered code review in GitHub Actions (every repo)
- They already have Claude-powered PR metadata validation (pr-metadata-gate)
- They already have an agent-resources repo with skills (Ralph Loop, design doc reviewer)
- They use conventional commits enforced by
semantic-pr.yaml - They use Policy Bot with team-specific approval rules
- They use Bulldozer for auto-merge with squash strategy
- They use release-please for automated releases
FINN is not at "Stage 2" — they're at Stage 2.5–3, already with some Stage 4 infrastructure in place. The roadmap needs to be calibrated to this reality.
| Action Item | Concreteness (1-5) | Assessment |
|---|---|---|
| Create AGENTS.md | 3 | Mentions it should exist but doesn't define structure, sections, or content. No guidance on hierarchical files (root vs per-repo). FINN already has AGENTS.md + CLAUDE.md in EWA-Actions — the doc doesn't know this. |
| Establish PLAN.md workflow | 2 | Vague. Says "every feature branch gets a PLAN.md" but doesn't explain: Who creates it? What template? How does it integrate with Linear tickets? How does it get consolidated on merge? |
| Mandate TDD with AI | 2 | The right idea, but no implementation plan. FINN has 83 active repos across Python, TypeScript, PHP, Go, and HCL — each with different test frameworks (Jest, pytest, Go test, PHPUnit). "Mandate TDD" for whom? For agents only? For human+agent workflows? How do you enforce it? |
| Set up Claude Code organization account | 4 | Concrete, but already partially done — FINN has ANTHROPIC_API_KEY as an org secret powering Claude code review in CI. The question is about individual engineer tooling, not just CI. |
| Measure baselines | 3 | Says "Track PRs merged per engineer per day" but doesn't mention tooling. FINN uses Linear for project management — tracking should integrate with that. Also doesn't define what "defect rates" means or how to measure them. |
What's missing from Phase 1:
- Audit of existing AI context: Before creating AGENTS.md, inventory what already exists. FINN already has
EWA-Actions/AGENTS.mdandCLAUDE.md. The first step should be understanding and extending these, not starting from scratch. - How to distribute AGENTS.md: FINN's sync infrastructure (
sync-workflow-files.yml,sync-other-files.yml) already distributes files to 70+ repos. The doc should recommend using this existing mechanism. - Per-language guidance: FINN has Python (19 repos), TypeScript (17), HCL (11), PHP (9), Go (2). Each needs different AGENTS.md content (different linters, test runners, build tools).
- Integration with existing code review: FINN already has Claude reviewing PRs via
code-review.yaml. The AGENTS.md should feed into this workflow — the doc doesn't connect these dots.
Concreteness score: 2.6/5 — Directionally correct but assumes a green field that doesn't exist.
| Action Item | Concreteness (1-5) | Assessment |
|---|---|---|
| Deploy OpenClaw multi-instance | 2 | References an enterprise deployment blog post but gives no specifics: what hardware/cloud resources? How does it integrate with GitHub Actions? What about secrets management for a fintech? The doc hand-waves at "three instances — dev, QA, ops" without saying what each instance actually does. |
| Implement Ralph Wiggum loops | 3 | Good concept, and FINN already has this as a skill in agent-resources/skills-src/setup-ralph-loop/. But the doc doesn't mention how to integrate with their existing CI/CD or how to handle the "agent runs for 30 minutes" billing/resource concern. |
| Create role-specific agents | 2 | "Planner, implementer, verifier" — but what models? What prompts? How do they communicate? Where do they run? No architecture diagram, no concrete example. |
| Build context layer | 2 | Mentions "hierarchical agents.md files" but no structure proposed. What goes at root vs service vs module level? How do you avoid contradictions? How do you keep 83 repos' context files in sync? |
| Introduce MCP integrations | 1 | One sentence. Which MCP servers? What monitoring tools does FINN use? How do you connect them? This is the vaguest item in the entire doc. |
What's missing from Phase 2:
- Cost modeling: The doc mentions "$400–1,000/month per engineer" but doesn't break down what drives this. Token costs for Claude Opus vs Sonnet? Infrastructure hosting costs? FINN needs a specific budget proposal, not a range.
- Security architecture: FINN handles financial data. Agents need access to code but MUST NOT have access to production secrets, customer data, or payment credentials. Zero guidance on secrets isolation.
- Pilot project selection: Which of FINN's 83 repos should be the first target for orchestrated AI? The answer matters enormously. A greenfield microservice is very different from the FINN-Web-App monorepo.
- Team training plan: The doc assumes engineers will adopt these practices. But who trains them? What does the onboarding look like? What happens when someone's agent makes a mess?
Concreteness score: 2.0/5 — This phase is mostly aspirational. An engineer reading this would know the direction but not the steps.
| Action Item | Concreteness (1-5) | Assessment |
|---|---|---|
| Evaluate Lobu or Mission Control | 2 | References external projects but doesn't analyze fit for FINN. Lobu is open-source (good) but early-stage. Mission Control is pre-launch SaaS. Neither is evaluated against FINN's stack (AWS, EKS, Terraform). |
| Implement scenario-based testing | 2 | Borrows from StrongDM but StrongDM is a 3-person team building from scratch. FINN has 4+ years of existing code and existing test infrastructure. How do you retrofit scenario testing onto a legacy codebase? No guidance. |
| Build lightweight digital twins | 1 | The most hand-wavy item. "Create mock versions of payment processors, KYC providers, banking APIs." For a fintech, this is a massive undertaking. Which payment processors? What level of fidelity? How do you maintain them? This alone could be a quarter-long project. |
| Enable parallel agent workflows | 1 | "Multiple agent pairs working simultaneously... with orchestration handling merge conflicts." Zero implementation detail. Merge conflict resolution by AI is an unsolved problem at scale. |
| Shift engineer roles | 1 | This is an organizational change, not a technical task. You can't just "shift" roles — you need management buy-in, updated job descriptions, changed performance metrics, and potentially different hiring profiles. |
What's missing from Phase 3:
- Realistic scope: This phase tries to do too much. Digital twins + parallel agents + org restructuring + scenario testing in 6 weeks? For a team that's never done this? Unrealistic.
- Incremental milestones: No way to know if you're on track. What does "Week 12" look like vs "Week 15"?
- Rollback plan: What if multi-agent workflows produce worse code? How do you detect regression? How do you roll back?
- Regulatory implications: In fintech, changing how code is produced may have regulatory implications. Has compliance been consulted? Are there audit trail requirements for AI-generated code?
Concreteness score: 1.4/5 — This phase is essentially a wish list.
-
No assessment of current state: The doc assumes FINN is at "Stage 2" with ad-hoc AI usage. In reality, FINN has:
- Centralized workflow management (EWA-Actions → 70+ repos)
- Claude code review in CI
- PR metadata validation with AI
- Agent skills repo with documented patterns
- Conventional commits + semantic PR enforcement
- Policy Bot with team-specific approval rules
- Established team structure (backend, frontend, infra, QA, data, credit risk)
The roadmap should start from where FINN actually is, not where the doc imagines.
-
No mention of existing tooling: FINN uses:
- Linear for project management (ticket references enforced in PR titles)
- GrowthBook for feature flags (implementation checks in CI)
- Digger for Terraform automation
- Bulldozer for auto-merge
- Release Please for versioning
- Cloudflare for CDN/edge
- Pre-commit hooks (language-specific, centrally managed)
None of these are mentioned in the AI context/integration strategy.
-
Multi-language complexity ignored: FINN's org has Python (19), TypeScript (17), HCL (11), PHP (9), Go (2). The AI strategy needs to account for:
- Different test frameworks per language
- Different linting/formatting tools per language
- Different deployment patterns per language
- Different AI effectiveness per language (AI is better at Python/TS than HCL/PHP)
-
No concrete AGENTS.md structure proposed: The doc says "create AGENTS.md" repeatedly but never shows what one looks like. For a team that's never written one, this is a critical gap.
-
Team dynamics missing: Who champions this? Who has time to build infrastructure? In a 10-20 person startup, everyone is shipping features. The doc doesn't address the "innovation tax" of standing up AI infrastructure while maintaining velocity.
-
Bank of Thailand regulations: FINN operates in Thailand's fintech space. BoT has specific requirements for:
- Code change audit trails
- Data handling and processing
- System change management procedures
- Vendor/tool risk assessments (using AI in production code may require regulatory disclosure)
-
PCI-DSS / Data security: If FINN handles payment data, AI agents must never:
- Have access to cardholder data environments
- Be able to query production databases
- Log or transmit sensitive financial data in prompts
The doc mentions this briefly but doesn't propose a concrete isolation architecture.
-
Financial calculation correctness: The doc mentions TDD is "non-negotiable" for financial transactions but doesn't address:
- Property-based testing for financial calculations
- Decimal precision requirements (floating point vs fixed-point)
- Currency handling edge cases
- Timezone-dependent financial calculations
- Regulatory reporting accuracy requirements
-
Third-party API integration risk: FINN likely integrates with Thai banks (KTB, KBank based on repo names), payment processors, and KYC providers. AI-generated integration code needs extra scrutiny because:
- Financial API errors can move real money
- Rate limiting and retry logic must be correct
- Idempotency requirements are strict
- Error handling needs to account for partial failures
-
Incident response: What happens when AI-generated code causes a production incident? The doc doesn't address:
- How to distinguish AI-generated vs human-written code in incident analysis
- Whether AI-generated code needs special tagging in git
- How to update AI context to prevent recurrence
Hidden Prerequisites
Phase 1 prerequisites (not mentioned in doc):
├── Management/leadership buy-in for time investment
├── Budget approval for Claude Max/Team accounts
├── Audit of existing AI context files (AGENTS.md, CLAUDE.md in EWA-Actions)
├── Agreement on which repos to pilot
├── Baseline metrics tooling (need to BUILD this before you can measure)
└── Engineer training/workshops on AI-first workflows
Phase 2 prerequisites (not mentioned in doc):
├── Phase 1 AGENTS.md actually being used and refined (at least 4-6 weeks)
├── Cloud infrastructure budget for OpenClaw instances
├── Security review of agent access patterns
├── DevOps capacity to build and maintain agent infrastructure
├── At least one successful "agent builds a feature" demo to justify investment
└── Integration plan for Linear ↔ agent workflow
Phase 3 prerequisites (not mentioned in doc):
├── Phase 2 running smoothly for at least 2-3 months (not 6 weeks)
├── Proven agent reliability metrics (what's the success rate?)
├── Digital twin architecture design (this is a multi-month project)
├── Compliance/legal review of AI-generated code policies
├── Team comfort level with reduced human code writing
└── Customer/stakeholder communication about AI-generated software
The actual critical path is:
- Inventory current state → 2. Extend existing AGENTS.md → 3. Pilot on 2-3 repos → 4. Measure results → 5. Expand to more repos → 6. Add orchestration → 7. Scale
The doc's phases try to parallelize too much. The real dependency chain is strictly serial for the early steps.
If "Phase 1" means:
- ✅ Audit existing AI context files (1 week)
- ✅ Extend and standardize AGENTS.md across org (1-2 weeks)
- ✅ Document PLAN.md workflow and try it on 2-3 repos (1 week)
- ✅ Set up baseline metrics (1 week, can overlap)
4 weeks is achievable BUT only if someone is dedicated to it. In a startup where everyone is shipping features, expect 6-8 weeks elapsed time with part-time attention.
Setting up OpenClaw multi-instance + role-specific agents + MCP integrations + context layer in 6 weeks is not realistic for a team that:
- Has never deployed OpenClaw
- Needs to solve secrets management for fintech
- Needs security review
- Needs to integrate with existing CI/CD
Realistic timeline: 10-14 weeks, with a pilot on one repo expanding to more.
Digital twins + scenario testing + parallel agents + role shifts in 6 weeks? This is 6-12 months of work, minimum. Each of these items is a major project:
- Digital twins of banking APIs: 2-3 months alone
- Scenario-based testing framework: 1-2 months
- Parallel agent orchestration: 1-2 months
- Organizational role changes: ongoing
Realistic timeline: 6-12 months after Phase 2 is stable.
| Phase | Doc Timeline | Realistic Timeline | Notes |
|---|---|---|---|
| 0: Assessment | Not included | 1-2 weeks | Inventory, audit, stakeholder alignment |
| 1: Standardize | 4 weeks | 4-8 weeks | AGENTS.md, PLAN.md, baseline metrics |
| 2: Pilot | 6 weeks | 10-14 weeks | OpenClaw on 1-2 repos, prove value |
| 3: Scale | 6 weeks | 8-12 weeks | Expand to more repos, add agents |
| 4: Autonomy | Not scoped | 6-12 months | Digital twins, scenario testing, parallel agents |
Total: The doc claims 16 weeks. Realistic is 9-12 months to reach what the doc calls Phase 3.
- Excellent reference material: The doc synthesizes 19 high-quality sources into a coherent narrative. The StrongDM, Anthropic, and Intent Systems frameworks are genuinely useful.
- Right strategic direction: The progression from individual AI tools → shared context → orchestrated agents is correct.
- Good cost framing: The tiered cost table gives realistic ranges.
- Maturity model grounding: Anchoring to Shapiro's 5 Levels and Intent Systems' 7 stages gives a shared vocabulary.
- Fintech awareness: The doc correctly identifies that financial software needs stricter guardrails than typical SaaS.
-
Start with a "Phase 0" assessment: Inventory what FINN already has. The team is more advanced than the doc assumes. Build on existing infrastructure (EWA-Actions, agent-resources) rather than starting fresh.
-
Make the first deliverable concrete: Instead of "create AGENTS.md," the first deliverable should be: "Generate a comprehensive AGENTS.md by analyzing all 83 repos' CI configs, linter settings, commit conventions, and test frameworks, then distribute it via the existing EWA-Actions sync mechanism."
-
Pick a pilot repo: Choose one well-understood, medium-complexity repo (not FINN-Web-App — too big; not a tiny utility — too small). Good candidates:
Statement-Service,Account-Creation-Flow, orbanking-integrations. -
Add a security architecture section: For fintech, this can't be an afterthought. Define: what can agents access? How are secrets managed? What's the audit trail? How do you prevent data exfiltration through prompts?
-
Revise timelines with buffer: Double the doc's timelines and add explicit go/no-go checkpoints between phases.
-
Address the "who does this work" question: In a 10-20 person startup, someone needs to own AI infrastructure. Is it DevOps? A dedicated "AI platform" person? A rotating responsibility? This matters because infrastructure that nobody owns doesn't get maintained.
As a research doc: 8/10 — Well-written, well-sourced, convincing narrative.
As an implementation plan: 3/10 — Too vague, wrong baseline assumptions, unrealistic timelines.
Recommendation: Use this doc as the "why" document. Write a separate "how" document that starts from FINN's actual current state and proposes specific, time-boxed actions with owners and success criteria.