Skip to content

Instantly share code, notes, and snippets.

@MuhammadYossry
Last active May 2, 2026 09:40
Show Gist options
  • Select an option

  • Save MuhammadYossry/c925ab09907d0782223c7fc4f7149e48 to your computer and use it in GitHub Desktop.

Select an option

Save MuhammadYossry/c925ab09907d0782223c7fc4f7149e48 to your computer and use it in GitHub Desktop.
AccountantAgentOS — Custom AgentOS Runtime Blueprint, Personal Tax Intelligence & Year-Round Accounting Assistant

AccountantAgentOS — Custom AgentOS Runtime Blueprint

Personal Tax Intelligence & Year-Round Accounting Assistant

Design philosophy: Useful over impressive. Inspectable over magical. Narrow scope. Tool-backed evidence. Human in the loop for every consequential output.


Agent Configuration

AGENT_NAME:         AccountantOS
PRIMARY_PURPOSE:    Personal tax optimization, deduction discovery, IRS guidance,
                    and year-round accounting strategy — explained in plain language
DOMAIN:             Personal finance, US tax law, self-employment accounting,
                    small business deductions
TARGET_USERS:       Individuals — salaried employees, freelancers, small business
                    owners, gig workers, rental income earners
RISK_LEVEL:         high
AUTONOMY_LEVEL:     assistant
                    # Never semi-autonomous or autonomous for financial output.
                    # Always surfaces reasoning; never files, pays, or contacts IRS directly.
TOOLS_ALLOWED:      tax_bracket_lookup, ira_contribution_limit_lookup,
                    web_search (IRS publications only), calculator, date_math,
                    form_reference_lookup, filing_deadline_lookup
TOOLS_FORBIDDEN:    direct_bank_access, payment_processing, e-filing,
                    open_web_search (general), code_execution (user data)
DATA_SOURCES:       IRS.gov publications, user-provided income/expense data,
                    state tax authority pages, verified tax law references
SUCCESS_METRICS:    deductions_surfaced_per_session, user_clarity_score (self-rated),
                    plan_completeness_score, hallucination_incidents (target: 0),
                    human_review_pass_rate, tasks_requiring_no_followup
DEPLOYMENT_ENV:     local laptop or server (Docker sandbox, no cloud required)
BUDGET_PRIORITY:    balanced
                    # Cheap model for routine lookups; stronger model for
                    # IRS notice analysis, strategy synthesis, edge cases.
                    # Model choice left entirely to the operator — see Section 7.
PRIVACY_REQUIREMENTS: critical
                    # All user financial data stays local. No external logging.
                    # PII never leaves the container. No third-party telemetry.
HUMAN_REVIEW_POINTS: IRS notice response drafts, year-end strategy recommendations,
                    any output the user intends to act on financially
MULTI_AGENT_REQUIRED: yes
                    # Specialist sub-agents: TaxResearcher, DeductionScanner,
                    # StrategyAdvisor, NoticeResponder. See Section 3.
LONG_TERM_MEMORY:   yes
                    # Retains user financial profile, past session summaries,
                    # and confirmed deduction history across sessions.
LEARNING_ALLOWED:   constrained
                    # Skill learning only from verified, tool-backed outcomes.
                    # World-model updates require IRS publication citation.
                    # No learning from user-provided claims alone.
OUTPUT_STYLE:       plain language explanations + structured action lists +
                    document templates where relevant. No jargon without definition.

1. Executive Summary

AccountantOS is a personal tax intelligence runtime that helps individuals understand their full tax picture, surface overlooked deductions, respond to IRS notices with confidence, and build a year-round strategy — all in plain language.

Why this architecture fits:

  • Tax guidance is high-stakes and high-hallucination-risk. Every output must be traceable to a specific IRS publication or calculation. The Unix filesystem model enforces this by making verified_facts/ and hypotheses/ structurally separate — an unverified claim never migrates to confirmed policy without tool-backed evidence.

  • Users arrive with wildly different situations (W-2, 1099, S-corp, rental income, RSUs). A skill-learning agent that earns and prunes capabilities based on actual task outcomes becomes more precise over time rather than drifting toward generic advice.

  • Financial data is maximally sensitive. Docker isolation with no egress except explicitly whitelisted IRS domains keeps PII local and auditable.

  • The multi-agent design (Section 3) separates research from strategy from user communication — preventing the single-agent failure mode where one context window tries to be a researcher, advisor, and writer simultaneously.


2. Runtime Shape

Choice: Orchestrator + Specialist Workers

[User Task]
    │
    ▼
┌─────────────────────┐
│   Orchestrator      │  routes tasks, holds session context,
│   (AccountantOS)    │  enforces review gates, merges outputs
└──────┬──────────────┘
       │
  ┌────┴─────────────────────────────────┐
  │          │            │              │
  ▼          ▼            ▼              ▼
[TaxResearcher] [DeductionScanner] [StrategyAdvisor] [NoticeResponder]
  IRS pub       income/expense     year-round plan    IRS letter
  lookups       pattern matching   synthesis          analysis

Why not a single container? Each specialist has a tightly scoped system prompt, a dedicated skill file, and its own reward log. This prevents context pollution (deduction lists bleeding into IRS notice tone), enables independent skill pruning, and makes observability tractable — you can inspect exactly which agent produced a given output.

Why not full microservices? This runs on a local laptop or small server. Agents communicate via handoff files (Section 3), not HTTP. No Kubernetes. No service mesh. Just Docker Compose + shared volume mounts.


3. Purpose-Built Filesystem Layout

accountantos/
├── Dockerfile
├── docker-compose.yml
├── agents.yaml                    # agent roster, model hints, tool grants
├── env.example                    # MODEL_NAME, LOG_LEVEL, REVIEW_MODE
├── main.py                        # task router and session loop
├── requirements.txt
│
└── /data/                         # Docker volume mount — never leaves host
    │
    ├── .git/                      # longitudinal memory (Section 4)
    │
    ├── shared/                    # inter-agent communication
    │   ├── inbox/                 # orchestrator inbound queue
    │   ├── handoffs/              # specialist-to-specialist results
    │   ├── locks/                 # write semaphores
    │   └── session_segment.md    # live shared working context
    │
    ├── agents/
    │   │
    │   ├── orchestrator/
    │   │   ├── persona.md         # read-only: routing + escalation identity
    │   │   ├── constraints.md     # read-only: budget, risk floor, forbidden actions
    │   │   ├── skills.md          # earned routing heuristics (max 15 entries)
    │   │   ├── goals.md           # recurring routing failures to fix
    │   │   └── rewards.md         # last 30 task outcomes
    │   │
    │   ├── tax_researcher/
    │   │   ├── persona.md         # read-only: IRS publication specialist
    │   │   ├── constraints.md     # read-only: citations required, no speculation
    │   │   ├── skills.md          # earned lookup patterns (max 20)
    │   │   ├── goals.md
    │   │   ├── rewards.md
    │   │   └── reflections.md
    │   │
    │   ├── deduction_scanner/
    │   │   ├── persona.md         # read-only: pattern-match income/expense → deductions
    │   │   ├── constraints.md     # read-only: must cite IRS pub for every deduction
    │   │   ├── skills.md          # earned deduction patterns by filer type (max 20)
    │   │   ├── goals.md
    │   │   ├── rewards.md
    │   │   └── reflections.md
    │   │
    │   ├── strategy_advisor/
    │   │   ├── persona.md         # read-only: year-round tax strategy synthesizer
    │   │   ├── constraints.md     # read-only: strategy = options + tradeoffs, not mandates
    │   │   ├── skills.md          # earned strategy templates (max 20)
    │   │   ├── goals.md
    │   │   ├── rewards.md
    │   │   └── reflections.md
    │   │
    │   └── notice_responder/
    │       ├── persona.md         # read-only: IRS notice interpreter + response drafter
    │       ├── constraints.md     # read-only: always recommend professional review
    │       ├── skills.md          # earned notice-type handling patterns (max 15)
    │       ├── goals.md
    │       ├── rewards.md
    │       └── reflections.md
    │
    ├── user_profile/              # persistent user financial context
    │   ├── situation.md           # filing status, income sources, state
    │   ├── expense_categories.md  # documented expense patterns
    │   ├── deduction_history.md   # confirmed deductions from past sessions
    │   └── preferences.md         # communication style, detail level
    │
    ├── tax_knowledge/             # the agent's epistemic layer
    │   ├── index.md
    │   ├── verified_facts/        # IRS-publication-backed, tool-verified
    │   │   ├── brackets_2024.md
    │   │   ├── contribution_limits.md
    │   │   ├── se_tax_rates.md
    │   │   ├── standard_deduction.md
    │   │   └── common_deductions/
    │   │       ├── home_office.md
    │   │       ├── vehicle.md
    │   │       ├── qbi_deduction.md
    │   │       └── ...
    │   ├── hypotheses/            # patterns with <3 confirmed task wins
    │   │   └── README.md          # "graduate to verified_facts/ only after
    │   │                          #  3 tool-verified +1 outcomes"
    │   └── jurisdiction_notes/    # state-specific overlays (CA, NY, TX, etc.)
    │
    └── outputs/                   # session deliverables for user review
        ├── deduction_reports/
        ├── strategy_plans/
        ├── notice_responses/
        └── year_end_checklists/

Design decisions:

  • tax_knowledge/ is split into verified_facts/ and hypotheses/ — this is the single most important structural choice. A fact in verified_facts/ must have a frontmatter citation: field pointing to an IRS publication and a tool-verified outcome count ≥ 3. No agent may treat a hypothesis as a fact.

  • user_profile/ persists across sessions. The user answers onboarding questions once; subsequent sessions load context from these files rather than re-asking.

  • outputs/ is the only directory the user interacts with directly. Everything else is agent-internal.


4. Custom Agent Memory Model

The following memory types are active in AccountantOS, each justified by domain need:

Memory Type File(s) Justification
Skills agents/*/skills.md Earned routing and pattern-match heuristics — e.g., "freelance + home office → always check QBI first"
Procedures tax_knowledge/verified_facts/ Step-by-step calculation procedures (SE tax, QBI, depreciation) that must be consistent across sessions
Verified Facts tax_knowledge/verified_facts/ IRS-publication-backed tax law — the ground truth layer
Hypotheses tax_knowledge/hypotheses/ Observed patterns not yet verified 3× — quarantined from user output until confirmed
User Preferences user_profile/preferences.md Communication style, preferred detail level, past questions
User Situation user_profile/situation.md Filing status, income type, state — eliminates re-onboarding
Deduction History user_profile/deduction_history.md Which deductions the user has already claimed or explored — prevents redundant advice
Policies agents/*/constraints.md Hard rules: always cite sources, never file directly, always recommend CPA review for >$1k decisions

Explicitly excluded:

  • Cases — individual case memory would balloon with PII. Session summaries (non-PII, outcome-focused) are stored in Git commits instead (Section 5).
  • Templates — IRS forms are referenced by number, not stored locally. Output templates live in outputs/ but are generated, not memorized.

5. Verification Architecture

Tax advice without verification is liability. AccountantOS applies layered checks:

Layer 1 — Tool-Backed Fact Checks (automated, every response)

Every deduction claim must pass through tax_bracket_lookup or form_reference_lookup before being surfaced to the user. The agent is instructed: "If you cannot verify this with a tool call, classify it as a hypothesis, not a fact."

Layer 2 — Citation Requirements (structural)

Every item in verified_facts/ carries frontmatter:

---
type: verified_fact
citation: "IRS Publication 587, Business Use of Your Home"
pub_url: "https://www.irs.gov/pub/irs-pdf/p587.pdf"
verified_date: 2024-11-01
tool_verifications: 4
last_reviewed: 2025-01-15
---

An agent that references a fact without a citation field is blocked by the orchestrator's constraint check before output is returned.

Layer 3 — Hypothesis Quarantine

Any pattern observed fewer than 3 times lives in hypotheses/. These are never surfaced directly to the user — they may inform research direction but not advice.

Layer 4 — Human Review Gate (mandatory for high-stakes outputs)

The following output types are always flagged for human review before delivery:

  • IRS notice response drafts
  • Year-end strategy plans involving amounts > $500
  • Any recommendation involving an entity structure change (LLC, S-corp)
  • Amended return guidance

The gate surfaces a [REQUIRES REVIEW] header and a plain-language explanation of what the user should verify with a CPA before acting.

Layer 5 — Dual-Pass for Notice Responses

IRS notice responses go through two agents: notice_responder drafts, then tax_researcher cross-checks the cited regulation. Discrepancies block the output.

Layer 6 — Annual Knowledge Refresh

A refresh_task is scheduled annually (configurable) to re-verify every entry in verified_facts/ against current IRS publications. Stale facts are demoted to hypotheses/ until re-verified.


6. Learning Policy — Constrained

What is learned:

  • Routing heuristics (orchestrator skills): which specialist handles which task pattern
  • Deduction pattern skills: "freelancer + vehicle + mileage log present → deduction likely valid"
  • Notice-type classification skills: CP2000 vs CP501 handling patterns

What is never learned from user data alone:

  • Tax law facts (these require IRS publication verification)
  • Calculation procedures (these are locked in verified_facts/ by a human or tool)
  • Any claim the user makes about their situation (recorded in user_profile/ but never promoted to the knowledge base)

Reward schema:

reward_decomposition:
  accuracy:     +1 / 0 / -1   # was the output factually correct?
  completeness: +1 / 0 / -1   # did it cover the user's full situation?
  clarity:      +1 / 0 / -1   # user self-rates: did they understand it?
  actionability: +1 / 0 / -1  # did it produce something the user could act on?
context_tags: [freelance, home_office, irs_notice, retirement, se_tax, ...]
citation_present: true/false   # hard requirement
hallucination_flag: true/false # logged permanently; triggers skill review

Skill budget: 20 per specialist agent. When full, the skill with the lowest cumulative accuracy + completeness score is pruned. A skill with any hallucination_flag: true is pruned immediately regardless of other scores.

Skill graduation from hypothesis: A pattern in hypotheses/ graduates to verified_facts/ only when:

  1. It has earned +1 on accuracy in at least 3 independent sessions, AND
  2. A tool-backed citation has been attached.

7. Model Routing Policy

No model is hardcoded. The operator sets MODEL_FAST, MODEL_STANDARD, and MODEL_STRONG in env.example. AccountantOS routes based on task class.

┌──────────────────────────────────────────────────────────────────────────┐
│  Task Class                    │ Routing Tier   │ Rationale               │
├──────────────────────────────────────────────────────────────────────────┤
│ IRS publication lookup         │ MODEL_FAST     │ Deterministic retrieval  │
│ Tax bracket / limit lookup     │ MODEL_FAST     │ Pure calculation         │
│ Deduction list for known type  │ MODEL_FAST     │ Pattern match, known     │
│ Explaining known deductions    │ MODEL_STANDARD │ Synthesis + plain lang   │
│ Quarterly payment calculation  │ MODEL_STANDARD │ Multi-step math          │
│ Year-round strategy plan       │ MODEL_STANDARD │ Multi-factor synthesis   │
│ IRS notice analysis            │ MODEL_STRONG   │ High stakes, nuanced     │
│ Novel situation (no prior +1)  │ MODEL_STRONG   │ Low confidence, flag     │
│ Dual-pass notice verification  │ MODEL_STRONG   │ Accuracy critical        │
│ Entity structure advice        │ MODEL_STRONG   │ Irreversible decisions   │
└──────────────────────────────────────────────────────────────────────────┘

Configuration:

# env.example — operator sets these; no defaults hardcoded
MODEL_FAST=             # e.g. a small/fast model of your choice
MODEL_STANDARD=         # e.g. a mid-tier model of your choice
MODEL_STRONG=           # e.g. a frontier model of your choice
MODEL_PROVIDER=         # anthropic | openai | local | other

Routing logic in main.py:

def route_model(task_class: str, novelty_score: float) -> str:
    if novelty_score > 0.7 or task_class in HIGH_STAKES_CLASSES:
        return os.getenv("MODEL_STRONG")
    if task_class in STANDARD_CLASSES:
        return os.getenv("MODEL_STANDARD")
    return os.getenv("MODEL_FAST")

8. Security Policy

Risk level: HIGH | Privacy: CRITICAL

Network Access Rules

# docker-compose.yml network config
networks:
  accountantos_net:
    driver: bridge
    internal: false   # allow egress only to whitelist

egress_whitelist:
  - "*.irs.gov"
  - "*.treasury.gov"
  - "*.ssa.gov"       # for SE tax rate verification
  # All other egress: BLOCKED

Data Containment

  • All user financial data lives exclusively under /data/ (Docker volume, host-local)
  • No cloud sync. No S3. No external logging services.
  • .gitignore excludes user_profile/situation.md from Git by default (user may opt in to include anonymized summaries)

Secret Handling

# .env is never committed. env.example has no real values.
# API keys loaded via Docker secrets or env vars only:
docker run --env-file .env accountantos

PII Policy

  • User income, SSN fragments, account numbers: never logged, never committed to Git
  • Session summaries in Git commits contain only outcome metadata: task: deduction scan | filer_type: freelance | deductions_found: 7 | reward: +1
  • outputs/ files are stored locally; user explicitly exports them

Sandboxing

  • Each specialist agent runs in its own container (see docker-compose.yml)
  • No agent can write to another agent's private directory
  • Lock files in shared/locks/ prevent concurrent writes to shared state
  • constraints.md for every agent is mounted read-only

Approval Gates

  • REVIEW_MODE=strict (default): all outputs > threshold go to outputs/review/ with a [REQUIRES HUMAN REVIEW] prefix before being shown to user
  • REVIEW_MODE=relaxed: informational outputs shown immediately; strategy and notice outputs still gated

9. Docker Project Files

Dockerfile

FROM python:3.12-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Data volume — all user state lives here, never in the image
VOLUME ["/data"]

# Git init for longitudinal memory
RUN git config --global user.email "accountantos@local" && \
    git config --global user.name "AccountantOS"

CMD ["python", "main.py"]

docker-compose.yml

version: "3.9"

services:

  orchestrator:
    build: .
    container_name: accountantos_orchestrator
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=orchestrator
      - REVIEW_MODE=${REVIEW_MODE:-strict}
    networks:
      - accountantos_net
    stdin_open: true
    tty: true

  tax_researcher:
    build: .
    container_name: accountantos_researcher
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=tax_researcher
    networks:
      - accountantos_net

  deduction_scanner:
    build: .
    container_name: accountantos_scanner
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=deduction_scanner
    networks:
      - accountantos_net

  strategy_advisor:
    build: .
    container_name: accountantos_strategy
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=strategy_advisor
    networks:
      - accountantos_net

  notice_responder:
    build: .
    container_name: accountantos_notices
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=notice_responder
    networks:
      - accountantos_net

volumes:
  accountantos_data:
    driver: local

networks:
  accountantos_net:
    driver: bridge

agents.yaml

agents:

  orchestrator:
    description: "Routes tasks, enforces review gates, merges specialist outputs"
    persona_file: agents/orchestrator/persona.md
    constraints_file: agents/orchestrator/constraints.md
    skill_budget: 15
    model_tier: standard
    tools: [date_math, filing_deadline_lookup]

  tax_researcher:
    description: "Looks up IRS publications, verifies facts, cites sources"
    persona_file: agents/tax_researcher/persona.md
    constraints_file: agents/tax_researcher/constraints.md
    skill_budget: 20
    model_tier: standard       # strong for novel lookups via routing policy
    tools: [web_search_irs, form_reference_lookup, tax_bracket_lookup]

  deduction_scanner:
    description: "Matches user income/expense profile to verified deductions"
    persona_file: agents/deduction_scanner/persona.md
    constraints_file: agents/deduction_scanner/constraints.md
    skill_budget: 20
    model_tier: standard
    tools: [calculator, ira_contribution_limit_lookup]

  strategy_advisor:
    description: "Synthesizes year-round tax strategy from user situation"
    persona_file: agents/strategy_advisor/persona.md
    constraints_file: agents/strategy_advisor/constraints.md
    skill_budget: 20
    model_tier: strong
    tools: [calculator, date_math, ira_contribution_limit_lookup]

  notice_responder:
    description: "Interprets IRS notices, drafts responses, flags urgency"
    persona_file: agents/notice_responder/persona.md
    constraints_file: agents/notice_responder/constraints.md
    skill_budget: 15
    model_tier: strong
    tools: [form_reference_lookup, web_search_irs, date_math]

env.example

# ── Model Configuration (operator sets these — no defaults) ──────────────────
MODEL_FAST=
MODEL_STANDARD=
MODEL_STRONG=
MODEL_PROVIDER=          # anthropic | openai | ollama | other
API_KEY=                 # your provider API key

# ── Runtime Settings ─────────────────────────────────────────────────────────
REVIEW_MODE=strict       # strict | relaxed
LOG_LEVEL=info           # debug | info | warn | error
SKILL_BUDGET=20          # max skills per agent
REWARD_LOG_SIZE=30       # rolling reward window
HYPOTHESIS_THRESHOLD=3   # confirmations required to graduate a hypothesis

# ── Privacy ──────────────────────────────────────────────────────────────────
GIT_COMMIT_PII=false     # never commit user_profile/situation.md to Git
TELEMETRY=false          # no external telemetry, ever

# ── Tax Year ─────────────────────────────────────────────────────────────────
TAX_YEAR=2024
USER_STATE=              # e.g. CA, NY, TX — for state overlay lookups

Startup Flow

docker compose up
    │
    ▼
orchestrator starts
    │
    ├── loads user_profile/situation.md (if exists)
    ├── loads session_segment.md
    ├── reads inbox/ for queued tasks
    │
    ▼
user submits task (one of the 7 task types)
    │
    ▼
orchestrator classifies task → routes to specialist
    │
    ▼
specialist loads its skills.md + relevant verified_facts/
    │
    ├── calls tools as needed (citation required)
    ├── writes result to shared/handoffs/
    │
    ▼
orchestrator merges result
    │
    ├── runs citation check
    ├── applies review gate if applicable
    ├── writes to outputs/ for user
    │
    ▼
reward logged → Git commit on +1

10. Observability

Mission-specific metrics — not generic agent stats:

Metric How Tracked Target
deductions_surfaced_per_session Count in reward log Trend up as skills mature
deductions_with_citation_rate Citation present flag 100% — hard requirement
hallucination_incidents hallucination_flag in rewards 0
user_clarity_score Self-rated 1–5 after each session ≥ 4.0 avg
hypothesis_graduation_rate Hypotheses promoted / total Rising = knowledge maturing
review_gate_rejection_rate Human review flags / total outputs Falling = quality improving
notice_response_approval_rate User-approved drafts / total ≥ 90%
model_tier_distribution Fast / Standard / Strong % per task class Cost indicator
skill_churn_rate Skills added vs pruned per 30 tasks Falling = agent maturing
stale_fact_count Verified facts past annual review date 0 target

Log format (append to rewards.md):

---
date: 2024-11-15
task_type: deduction_scan
filer_type: freelance
agent: deduction_scanner
citations_present: true
hallucination_flag: false
reward_decomposition:
  accuracy: +1
  completeness: +1
  clarity: +1
  actionability: +1
context_tags: [freelance, home_office, vehicle, qbi]
model_tier_used: standard
user_clarity_score: 5
---

Git commit format:

scanner(learn): earn home-office-freelance pattern from 3rd +1 session
task: deduction scan | filer: freelance | deductions: 9
reward: +1 (accuracy+1, completeness+1, clarity+1, actionability+1)
skills-added: home-office-exclusive-use-check
skills-pruned: none
hypothesis-graduated: home_office_shared_space_disqualifier

11. The 7 Task Types + 3 Bonus Tasks

These map directly to the seven use cases in your design goals:

  1. Deduction Discovery (freelancer) "Act as a certified CPA. My situation: freelance designer, $85,000 annual income, expenses include home office, laptop, software subscriptions, professional development. Identify every deduction I likely qualify for that most people in my situation overlook." → Routes to: deduction_scanner + tax_researcher

  2. Full Tax Situation Explained "Explain my tax situation in plain language. I earn $72,000 W-2 and $18,000 freelance in California. Break down what I owe, why, and the most important decisions before April 15." → Routes to: strategy_advisor + tax_researcher

  3. Business Deduction Audit "I run a sole proprietorship consultancy earning $140,000 with expenses: travel, home office, subcontractors, software. Identify every legitimate deduction, what documentation I need, and the most commonly missed ones." → Routes to: deduction_scanner (primary) + tax_researcher

  4. Year-End Tax Reduction Sprint "It is October. My estimated tax liability is $22,000. What are the most impactful legal moves I can make before December 31 to reduce what I owe? I'm self-employed, contribute to a SEP-IRA, and have $30k in unrealized stock losses." → Routes to: strategy_advisor + deduction_scanner

  5. Self-Employment Tax Explainer "I'm a freelance developer earning $95,000/year in Texas. Explain exactly how SE tax works, what quarterly payments I should be making, how to calculate them, and the top strategies to legally reduce my SE tax burden." → Routes to: tax_researcher + strategy_advisor

  6. IRS Notice Response "I received a CP2000 notice proposing I owe an additional $4,200 due to unreported income from a 1099 I thought my employer handled. Explain what this means, whether I should be concerned, my response options, and a step-by-step resolution plan." → Routes to: notice_responder + tax_researcher (dual-pass)

  7. 12-Month Tax Strategy "I'm single, earning $110,000 W-2 plus $25,000 from rental income in New York. Goals: grow retirement savings, possibly start an LLC next year. Build me a 12-month tax strategy that minimizes what I owe and maximizes what I keep." → Routes to: strategy_advisor (primary) + all specialists

  8. Quarterly Estimated Tax Calculator (bonus) "Walk me through calculating my Q3 estimated payment. I've earned $67,000 so far this year self-employed, paid $8,500 in estimated taxes, and expect another $25,000 in Q3." → Routes to: tax_researcher + calculator tool

  9. Deduction Documentation Checklist (bonus) "I plan to claim home office, vehicle (actual expense method), and professional development deductions. What exact documentation do I need to maintain for each to survive an audit?" → Routes to: deduction_scanner + tax_researcher

  10. Entity Structure Decision Support (bonus — MODEL_STRONG, always gated) "I'm currently a sole proprietor earning $180,000. Should I consider an S-corp election? Walk me through the tax math, the tradeoffs, and what I'd need to do." → Routes to: strategy_advisor | REVIEW_MODE=strict enforced | CPA recommendation mandatory


12. Risks & Failure Modes

Risk Severity Mitigation
Hallucinated deduction (agent invents nonexistent IRS rule) Critical Citation required for every claim; hallucination_flag triggers skill purge
Stale tax law (brackets/limits change annually) High Annual refresh task; verified_facts carry last_reviewed date; staleness alert
User acts on advice without CPA review High Every output includes disclosure; strategy/notice outputs always gated
PII leaking to Git history High GIT_COMMIT_PII=false default; user_profile/situation.md in .gitignore
Over-confident notice response High Dual-pass architecture; always recommends professional response
Model produces confident wrong math Medium Calculator tool used for all arithmetic; never trust LLM arithmetic alone
Hypothesis promoted prematurely Medium 3-confirmation threshold + citation required; orchestrator enforces gate
User's state tax rules ignored Medium USER_STATE env var loads jurisdiction overlay; flagged if missing
Skill churn / thrashing Low Skill budget + pruning policy; churn rate logged in observability
Context window overflow for complex situations Low Specialist agents isolate concerns; session_segment.md scoped per task

Limitations to disclose to users explicitly:

  • AccountantOS is not a licensed CPA. All outputs are educational guidance.
  • Tax law changes. Always verify against current IRS publications before acting.
  • State tax rules vary significantly. State guidance requires verification.
  • For amounts over $1,000 in tax impact, professional review is strongly recommended.

13. v1 Scope — Ship in 2 Weeks

Goal: A working single-user local runtime that handles the 5 highest-frequency tasks.

Include in v1:

  • Docker Compose setup with orchestrator + 2 specialists (deduction_scanner, tax_researcher)
  • User onboarding flow: situation.md populated from 5 questions
  • Task types 1, 2, 3, 5 (deductions, situation explanation, SE tax)
  • Verified facts for current year: brackets, standard deduction, SE tax rate, IRA limits, QBI
  • Citation requirement enforced (hard block if missing)
  • Plain-language output formatter
  • Git commit on every +1 reward
  • REVIEW_MODE=strict active by default
  • [REQUIRES HUMAN REVIEW] disclosure on all strategy outputs

Defer from v1:

  • IRS notice responder (complex, needs dual-pass — v2)
  • Year-round 12-month strategy (needs full specialist roster — v2)
  • Hypothesis graduation automation (manual review in v1 is safer)
  • State jurisdiction overlays beyond 3–4 common states
  • Entity structure advice (S-corp election complexity — v2)

Success signal for v1: A real user with a real freelance tax situation runs tasks 1 and 5, rates clarity ≥ 4/5, and identifies at least one deduction they weren't aware of — with a valid IRS citation.


14. v2 Expansion Plan

Activate after: 20+ real sessions, reward logs reviewed, skill churn stable.

Feature Rationale
NoticeResponder agent High user anxiety, high value — add after v1 proves routing stability
12-month strategy synthesis Requires all 4 specialists working reliably — enable after individual specialist quality is confirmed
State jurisdiction overlays Expand from 3 states to full 50 based on USER_STATE usage logs
Hypothesis graduation automation Safe once 3-confirmation threshold is battle-tested in v1
Annual facts refresh scheduler Cron job that re-verifies verified_facts/ against IRS publications each January
Multi-user support Separate user_profile/ per user ID; shared tax_knowledge/ read-only to all
Deduction documentation generator Output a formatted checklist PDF per claimed deduction
Quarterly tax calendar Proactive reminders tied to USER_STATE and filing status
S-corp / entity advice module Highest-stakes task class — needs its own specialist, strong model mandatory, extra review gate

AccountantOS — built on Git, Docker, and the honest self-assessment that most tax advice agents need fewer opinions and more citations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment