AccountantAgentOS — Custom AgentOS Runtime Blueprint

Personal Tax Intelligence & Year-Round Accounting Assistant

Design philosophy: Useful over impressive. Inspectable over magical. Narrow scope. Tool-backed evidence. Human in the loop for every consequential output.

Agent Configuration

AGENT_NAME:         AccountantOS
PRIMARY_PURPOSE:    Personal tax optimization, deduction discovery, IRS guidance,
                    and year-round accounting strategy — explained in plain language
DOMAIN:             Personal finance, US tax law, self-employment accounting,
                    small business deductions
TARGET_USERS:       Individuals — salaried employees, freelancers, small business
                    owners, gig workers, rental income earners
RISK_LEVEL:         high
AUTONOMY_LEVEL:     assistant
                    # Never semi-autonomous or autonomous for financial output.
                    # Always surfaces reasoning; never files, pays, or contacts IRS directly.
TOOLS_ALLOWED:      tax_bracket_lookup, ira_contribution_limit_lookup,
                    web_search (IRS publications only), calculator, date_math,
                    form_reference_lookup, filing_deadline_lookup
TOOLS_FORBIDDEN:    direct_bank_access, payment_processing, e-filing,
                    open_web_search (general), code_execution (user data)
DATA_SOURCES:       IRS.gov publications, user-provided income/expense data,
                    state tax authority pages, verified tax law references
SUCCESS_METRICS:    deductions_surfaced_per_session, user_clarity_score (self-rated),
                    plan_completeness_score, hallucination_incidents (target: 0),
                    human_review_pass_rate, tasks_requiring_no_followup
DEPLOYMENT_ENV:     local laptop or server (Docker sandbox, no cloud required)
BUDGET_PRIORITY:    balanced
                    # Cheap model for routine lookups; stronger model for
                    # IRS notice analysis, strategy synthesis, edge cases.
                    # Model choice left entirely to the operator — see Section 7.
PRIVACY_REQUIREMENTS: critical
                    # All user financial data stays local. No external logging.
                    # PII never leaves the container. No third-party telemetry.
HUMAN_REVIEW_POINTS: IRS notice response drafts, year-end strategy recommendations,
                    any output the user intends to act on financially
MULTI_AGENT_REQUIRED: yes
                    # Specialist sub-agents: TaxResearcher, DeductionScanner,
                    # StrategyAdvisor, NoticeResponder. See Section 3.
LONG_TERM_MEMORY:   yes
                    # Retains user financial profile, past session summaries,
                    # and confirmed deduction history across sessions.
LEARNING_ALLOWED:   constrained
                    # Skill learning only from verified, tool-backed outcomes.
                    # World-model updates require IRS publication citation.
                    # No learning from user-provided claims alone.
OUTPUT_STYLE:       plain language explanations + structured action lists +
                    document templates where relevant. No jargon without definition.

1. Executive Summary

AccountantOS is a personal tax intelligence runtime that helps individuals understand their full tax picture, surface overlooked deductions, respond to IRS notices with confidence, and build a year-round strategy — all in plain language.

Why this architecture fits:

Tax guidance is high-stakes and high-hallucination-risk. Every output must be traceable to a specific IRS publication or calculation. The Unix filesystem model enforces this by making verified_facts/ and hypotheses/ structurally separate — an unverified claim never migrates to confirmed policy without tool-backed evidence.
Users arrive with wildly different situations (W-2, 1099, S-corp, rental income, RSUs). A skill-learning agent that earns and prunes capabilities based on actual task outcomes becomes more precise over time rather than drifting toward generic advice.
Financial data is maximally sensitive. Docker isolation with no egress except explicitly whitelisted IRS domains keeps PII local and auditable.
The multi-agent design (Section 3) separates research from strategy from user communication — preventing the single-agent failure mode where one context window tries to be a researcher, advisor, and writer simultaneously.

2. Runtime Shape

Choice: Orchestrator + Specialist Workers

[User Task]
    │
    ▼
┌─────────────────────┐
│   Orchestrator      │  routes tasks, holds session context,
│   (AccountantOS)    │  enforces review gates, merges outputs
└──────┬──────────────┘
       │
  ┌────┴─────────────────────────────────┐
  │          │            │              │
  ▼          ▼            ▼              ▼
[TaxResearcher] [DeductionScanner] [StrategyAdvisor] [NoticeResponder]
  IRS pub       income/expense     year-round plan    IRS letter
  lookups       pattern matching   synthesis          analysis

Why not a single container? Each specialist has a tightly scoped system prompt, a dedicated skill file, and its own reward log. This prevents context pollution (deduction lists bleeding into IRS notice tone), enables independent skill pruning, and makes observability tractable — you can inspect exactly which agent produced a given output.

Why not full microservices? This runs on a local laptop or small server. Agents communicate via handoff files (Section 3), not HTTP. No Kubernetes. No service mesh. Just Docker Compose + shared volume mounts.

3. Purpose-Built Filesystem Layout

accountantos/
├── Dockerfile
├── docker-compose.yml
├── agents.yaml                    # agent roster, model hints, tool grants
├── env.example                    # MODEL_NAME, LOG_LEVEL, REVIEW_MODE
├── main.py                        # task router and session loop
├── requirements.txt
│
└── /data/                         # Docker volume mount — never leaves host
    │
    ├── .git/                      # longitudinal memory (Section 4)
    │
    ├── shared/                    # inter-agent communication
    │   ├── inbox/                 # orchestrator inbound queue
    │   ├── handoffs/              # specialist-to-specialist results
    │   ├── locks/                 # write semaphores
    │   └── session_segment.md    # live shared working context
    │
    ├── agents/
    │   │
    │   ├── orchestrator/
    │   │   ├── persona.md         # read-only: routing + escalation identity
    │   │   ├── constraints.md     # read-only: budget, risk floor, forbidden actions
    │   │   ├── skills.md          # earned routing heuristics (max 15 entries)
    │   │   ├── goals.md           # recurring routing failures to fix
    │   │   └── rewards.md         # last 30 task outcomes
    │   │
    │   ├── tax_researcher/
    │   │   ├── persona.md         # read-only: IRS publication specialist
    │   │   ├── constraints.md     # read-only: citations required, no speculation
    │   │   ├── skills.md          # earned lookup patterns (max 20)
    │   │   ├── goals.md
    │   │   ├── rewards.md
    │   │   └── reflections.md
    │   │
    │   ├── deduction_scanner/
    │   │   ├── persona.md         # read-only: pattern-match income/expense → deductions
    │   │   ├── constraints.md     # read-only: must cite IRS pub for every deduction
    │   │   ├── skills.md          # earned deduction patterns by filer type (max 20)
    │   │   ├── goals.md
    │   │   ├── rewards.md
    │   │   └── reflections.md
    │   │
    │   ├── strategy_advisor/
    │   │   ├── persona.md         # read-only: year-round tax strategy synthesizer
    │   │   ├── constraints.md     # read-only: strategy = options + tradeoffs, not mandates
    │   │   ├── skills.md          # earned strategy templates (max 20)
    │   │   ├── goals.md
    │   │   ├── rewards.md
    │   │   └── reflections.md
    │   │
    │   └── notice_responder/
    │       ├── persona.md         # read-only: IRS notice interpreter + response drafter
    │       ├── constraints.md     # read-only: always recommend professional review
    │       ├── skills.md          # earned notice-type handling patterns (max 15)
    │       ├── goals.md
    │       ├── rewards.md
    │       └── reflections.md
    │
    ├── user_profile/              # persistent user financial context
    │   ├── situation.md           # filing status, income sources, state
    │   ├── expense_categories.md  # documented expense patterns
    │   ├── deduction_history.md   # confirmed deductions from past sessions
    │   └── preferences.md         # communication style, detail level
    │
    ├── tax_knowledge/             # the agent's epistemic layer
    │   ├── index.md
    │   ├── verified_facts/        # IRS-publication-backed, tool-verified
    │   │   ├── brackets_2024.md
    │   │   ├── contribution_limits.md
    │   │   ├── se_tax_rates.md
    │   │   ├── standard_deduction.md
    │   │   └── common_deductions/
    │   │       ├── home_office.md
    │   │       ├── vehicle.md
    │   │       ├── qbi_deduction.md
    │   │       └── ...
    │   ├── hypotheses/            # patterns with <3 confirmed task wins
    │   │   └── README.md          # "graduate to verified_facts/ only after
    │   │                          #  3 tool-verified +1 outcomes"
    │   └── jurisdiction_notes/    # state-specific overlays (CA, NY, TX, etc.)
    │
    └── outputs/                   # session deliverables for user review
        ├── deduction_reports/
        ├── strategy_plans/
        ├── notice_responses/
        └── year_end_checklists/

Design decisions:

tax_knowledge/ is split into verified_facts/ and hypotheses/ — this is the single most important structural choice. A fact in verified_facts/ must have a frontmatter citation: field pointing to an IRS publication and a tool-verified outcome count ≥ 3. No agent may treat a hypothesis as a fact.
user_profile/ persists across sessions. The user answers onboarding questions once; subsequent sessions load context from these files rather than re-asking.
outputs/ is the only directory the user interacts with directly. Everything else is agent-internal.

4. Custom Agent Memory Model

The following memory types are active in AccountantOS, each justified by domain need:

Memory Type	File(s)	Justification
Skills	`agents/*/skills.md`	Earned routing and pattern-match heuristics — e.g., "freelance + home office → always check QBI first"
Procedures	`tax_knowledge/verified_facts/`	Step-by-step calculation procedures (SE tax, QBI, depreciation) that must be consistent across sessions
Verified Facts	`tax_knowledge/verified_facts/`	IRS-publication-backed tax law — the ground truth layer
Hypotheses	`tax_knowledge/hypotheses/`	Observed patterns not yet verified 3× — quarantined from user output until confirmed
User Preferences	`user_profile/preferences.md`	Communication style, preferred detail level, past questions
User Situation	`user_profile/situation.md`	Filing status, income type, state — eliminates re-onboarding
Deduction History	`user_profile/deduction_history.md`	Which deductions the user has already claimed or explored — prevents redundant advice
Policies	`agents/*/constraints.md`	Hard rules: always cite sources, never file directly, always recommend CPA review for >$1k decisions

Explicitly excluded:

Cases — individual case memory would balloon with PII. Session summaries (non-PII, outcome-focused) are stored in Git commits instead (Section 5).
Templates — IRS forms are referenced by number, not stored locally. Output templates live in outputs/ but are generated, not memorized.

5. Verification Architecture

Tax advice without verification is liability. AccountantOS applies layered checks:

Layer 1 — Tool-Backed Fact Checks (automated, every response)

Every deduction claim must pass through tax_bracket_lookup or form_reference_lookup before being surfaced to the user. The agent is instructed: "If you cannot verify this with a tool call, classify it as a hypothesis, not a fact."

Layer 2 — Citation Requirements (structural)

Every item in verified_facts/ carries frontmatter:

---
type: verified_fact
citation: "IRS Publication 587, Business Use of Your Home"
pub_url: "https://www.irs.gov/pub/irs-pdf/p587.pdf"
verified_date: 2024-11-01
tool_verifications: 4
last_reviewed: 2025-01-15
---

An agent that references a fact without a citation field is blocked by the orchestrator's constraint check before output is returned.

Layer 3 — Hypothesis Quarantine

Any pattern observed fewer than 3 times lives in hypotheses/. These are never surfaced directly to the user — they may inform research direction but not advice.

Layer 4 — Human Review Gate (mandatory for high-stakes outputs)

The following output types are always flagged for human review before delivery:

IRS notice response drafts
Year-end strategy plans involving amounts > $500
Any recommendation involving an entity structure change (LLC, S-corp)
Amended return guidance

The gate surfaces a [REQUIRES REVIEW] header and a plain-language explanation of what the user should verify with a CPA before acting.

Layer 5 — Dual-Pass for Notice Responses

IRS notice responses go through two agents: notice_responder drafts, then tax_researcher cross-checks the cited regulation. Discrepancies block the output.

Layer 6 — Annual Knowledge Refresh

A refresh_task is scheduled annually (configurable) to re-verify every entry in verified_facts/ against current IRS publications. Stale facts are demoted to hypotheses/ until re-verified.

6. Learning Policy — Constrained

What is learned:

Routing heuristics (orchestrator skills): which specialist handles which task pattern
Deduction pattern skills: "freelancer + vehicle + mileage log present → deduction likely valid"
Notice-type classification skills: CP2000 vs CP501 handling patterns

What is never learned from user data alone:

Tax law facts (these require IRS publication verification)
Calculation procedures (these are locked in verified_facts/ by a human or tool)
Any claim the user makes about their situation (recorded in user_profile/ but never promoted to the knowledge base)

Reward schema:

reward_decomposition:
  accuracy:     +1 / 0 / -1   # was the output factually correct?
  completeness: +1 / 0 / -1   # did it cover the user's full situation?
  clarity:      +1 / 0 / -1   # user self-rates: did they understand it?
  actionability: +1 / 0 / -1  # did it produce something the user could act on?
context_tags: [freelance, home_office, irs_notice, retirement, se_tax, ...]
citation_present: true/false   # hard requirement
hallucination_flag: true/false # logged permanently; triggers skill review

Skill budget: 20 per specialist agent. When full, the skill with the lowest cumulative accuracy + completeness score is pruned. A skill with any hallucination_flag: true is pruned immediately regardless of other scores.

Skill graduation from hypothesis: A pattern in hypotheses/ graduates to verified_facts/ only when:

It has earned +1 on accuracy in at least 3 independent sessions, AND
A tool-backed citation has been attached.

7. Model Routing Policy

No model is hardcoded. The operator sets MODEL_FAST, MODEL_STANDARD, and MODEL_STRONG in env.example. AccountantOS routes based on task class.

┌──────────────────────────────────────────────────────────────────────────┐
│  Task Class                    │ Routing Tier   │ Rationale               │
├──────────────────────────────────────────────────────────────────────────┤
│ IRS publication lookup         │ MODEL_FAST     │ Deterministic retrieval  │
│ Tax bracket / limit lookup     │ MODEL_FAST     │ Pure calculation         │
│ Deduction list for known type  │ MODEL_FAST     │ Pattern match, known     │
│ Explaining known deductions    │ MODEL_STANDARD │ Synthesis + plain lang   │
│ Quarterly payment calculation  │ MODEL_STANDARD │ Multi-step math          │
│ Year-round strategy plan       │ MODEL_STANDARD │ Multi-factor synthesis   │
│ IRS notice analysis            │ MODEL_STRONG   │ High stakes, nuanced     │
│ Novel situation (no prior +1)  │ MODEL_STRONG   │ Low confidence, flag     │
│ Dual-pass notice verification  │ MODEL_STRONG   │ Accuracy critical        │
│ Entity structure advice        │ MODEL_STRONG   │ Irreversible decisions   │
└──────────────────────────────────────────────────────────────────────────┘

Configuration:

# env.example — operator sets these; no defaults hardcoded
MODEL_FAST=             # e.g. a small/fast model of your choice
MODEL_STANDARD=         # e.g. a mid-tier model of your choice
MODEL_STRONG=           # e.g. a frontier model of your choice
MODEL_PROVIDER=         # anthropic | openai | local | other

Routing logic in main.py:

def route_model(task_class: str, novelty_score: float) -> str:
    if novelty_score > 0.7 or task_class in HIGH_STAKES_CLASSES:
        return os.getenv("MODEL_STRONG")
    if task_class in STANDARD_CLASSES:
        return os.getenv("MODEL_STANDARD")
    return os.getenv("MODEL_FAST")

8. Security Policy

Risk level: HIGH | Privacy: CRITICAL

Network Access Rules

# docker-compose.yml network config
networks:
  accountantos_net:
    driver: bridge
    internal: false   # allow egress only to whitelist

egress_whitelist:
  - "*.irs.gov"
  - "*.treasury.gov"
  - "*.ssa.gov"       # for SE tax rate verification
  # All other egress: BLOCKED

Data Containment

All user financial data lives exclusively under /data/ (Docker volume, host-local)
No cloud sync. No S3. No external logging services.
.gitignore excludes user_profile/situation.md from Git by default (user may opt in to include anonymized summaries)

Secret Handling

# .env is never committed. env.example has no real values.
# API keys loaded via Docker secrets or env vars only:
docker run --env-file .env accountantos

PII Policy

User income, SSN fragments, account numbers: never logged, never committed to Git
Session summaries in Git commits contain only outcome metadata: task: deduction scan | filer_type: freelance | deductions_found: 7 | reward: +1
outputs/ files are stored locally; user explicitly exports them

Sandboxing

Each specialist agent runs in its own container (see docker-compose.yml)
No agent can write to another agent's private directory
Lock files in shared/locks/ prevent concurrent writes to shared state
constraints.md for every agent is mounted read-only

Approval Gates

REVIEW_MODE=strict (default): all outputs > threshold go to outputs/review/ with a [REQUIRES HUMAN REVIEW] prefix before being shown to user
REVIEW_MODE=relaxed: informational outputs shown immediately; strategy and notice outputs still gated

9. Docker Project Files

Dockerfile

FROM python:3.12-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Data volume — all user state lives here, never in the image
VOLUME ["/data"]

# Git init for longitudinal memory
RUN git config --global user.email "accountantos@local" && \
    git config --global user.name "AccountantOS"

CMD ["python", "main.py"]

docker-compose.yml

version: "3.9"

services:

  orchestrator:
    build: .
    container_name: accountantos_orchestrator
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=orchestrator
      - REVIEW_MODE=${REVIEW_MODE:-strict}
    networks:
      - accountantos_net
    stdin_open: true
    tty: true

  tax_researcher:
    build: .
    container_name: accountantos_researcher
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=tax_researcher
    networks:
      - accountantos_net

  deduction_scanner:
    build: .
    container_name: accountantos_scanner
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=deduction_scanner
    networks:
      - accountantos_net

  strategy_advisor:
    build: .
    container_name: accountantos_strategy
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=strategy_advisor
    networks:
      - accountantos_net

  notice_responder:
    build: .
    container_name: accountantos_notices
    volumes:
      - accountantos_data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=notice_responder
    networks:
      - accountantos_net

volumes:
  accountantos_data:
    driver: local

networks:
  accountantos_net:
    driver: bridge

agents.yaml

agents:

  orchestrator:
    description: "Routes tasks, enforces review gates, merges specialist outputs"
    persona_file: agents/orchestrator/persona.md
    constraints_file: agents/orchestrator/constraints.md
    skill_budget: 15
    model_tier: standard
    tools: [date_math, filing_deadline_lookup]

  tax_researcher:
    description: "Looks up IRS publications, verifies facts, cites sources"
    persona_file: agents/tax_researcher/persona.md
    constraints_file: agents/tax_researcher/constraints.md
    skill_budget: 20
    model_tier: standard       # strong for novel lookups via routing policy
    tools: [web_search_irs, form_reference_lookup, tax_bracket_lookup]

  deduction_scanner:
    description: "Matches user income/expense profile to verified deductions"
    persona_file: agents/deduction_scanner/persona.md
    constraints_file: agents/deduction_scanner/constraints.md
    skill_budget: 20
    model_tier: standard
    tools: [calculator, ira_contribution_limit_lookup]

  strategy_advisor:
    description: "Synthesizes year-round tax strategy from user situation"
    persona_file: agents/strategy_advisor/persona.md
    constraints_file: agents/strategy_advisor/constraints.md
    skill_budget: 20
    model_tier: strong
    tools: [calculator, date_math, ira_contribution_limit_lookup]

  notice_responder:
    description: "Interprets IRS notices, drafts responses, flags urgency"
    persona_file: agents/notice_responder/persona.md
    constraints_file: agents/notice_responder/constraints.md
    skill_budget: 15
    model_tier: strong
    tools: [form_reference_lookup, web_search_irs, date_math]

env.example

# ── Model Configuration (operator sets these — no defaults) ──────────────────
MODEL_FAST=
MODEL_STANDARD=
MODEL_STRONG=
MODEL_PROVIDER=          # anthropic | openai | ollama | other
API_KEY=                 # your provider API key

# ── Runtime Settings ─────────────────────────────────────────────────────────
REVIEW_MODE=strict       # strict | relaxed
LOG_LEVEL=info           # debug | info | warn | error
SKILL_BUDGET=20          # max skills per agent
REWARD_LOG_SIZE=30       # rolling reward window
HYPOTHESIS_THRESHOLD=3   # confirmations required to graduate a hypothesis

# ── Privacy ──────────────────────────────────────────────────────────────────
GIT_COMMIT_PII=false     # never commit user_profile/situation.md to Git
TELEMETRY=false          # no external telemetry, ever

# ── Tax Year ─────────────────────────────────────────────────────────────────
TAX_YEAR=2024
USER_STATE=              # e.g. CA, NY, TX — for state overlay lookups

Startup Flow

docker compose up
    │
    ▼
orchestrator starts
    │
    ├── loads user_profile/situation.md (if exists)
    ├── loads session_segment.md
    ├── reads inbox/ for queued tasks
    │
    ▼
user submits task (one of the 7 task types)
    │
    ▼
orchestrator classifies task → routes to specialist
    │
    ▼
specialist loads its skills.md + relevant verified_facts/
    │
    ├── calls tools as needed (citation required)
    ├── writes result to shared/handoffs/
    │
    ▼
orchestrator merges result
    │
    ├── runs citation check
    ├── applies review gate if applicable
    ├── writes to outputs/ for user
    │
    ▼
reward logged → Git commit on +1

10. Observability

Mission-specific metrics — not generic agent stats:

Metric	How Tracked	Target
`deductions_surfaced_per_session`	Count in reward log	Trend up as skills mature
`deductions_with_citation_rate`	Citation present flag	100% — hard requirement
`hallucination_incidents`	`hallucination_flag` in rewards	0
`user_clarity_score`	Self-rated 1–5 after each session	≥ 4.0 avg
`hypothesis_graduation_rate`	Hypotheses promoted / total	Rising = knowledge maturing
`review_gate_rejection_rate`	Human review flags / total outputs	Falling = quality improving
`notice_response_approval_rate`	User-approved drafts / total	≥ 90%
`model_tier_distribution`	Fast / Standard / Strong % per task class	Cost indicator
`skill_churn_rate`	Skills added vs pruned per 30 tasks	Falling = agent maturing
`stale_fact_count`	Verified facts past annual review date	0 target

Log format (append to rewards.md):

---
date: 2024-11-15
task_type: deduction_scan
filer_type: freelance
agent: deduction_scanner
citations_present: true
hallucination_flag: false
reward_decomposition:
  accuracy: +1
  completeness: +1
  clarity: +1
  actionability: +1
context_tags: [freelance, home_office, vehicle, qbi]
model_tier_used: standard
user_clarity_score: 5
---

Git commit format:

scanner(learn): earn home-office-freelance pattern from 3rd +1 session
task: deduction scan | filer: freelance | deductions: 9
reward: +1 (accuracy+1, completeness+1, clarity+1, actionability+1)
skills-added: home-office-exclusive-use-check
skills-pruned: none
hypothesis-graduated: home_office_shared_space_disqualifier

11. The 7 Task Types + 3 Bonus Tasks

These map directly to the seven use cases in your design goals:

Deduction Discovery (freelancer) "Act as a certified CPA. My situation: freelance designer, $85,000 annual income, expenses include home office, laptop, software subscriptions, professional development. Identify every deduction I likely qualify for that most people in my situation overlook." → Routes to: deduction_scanner + tax_researcher
Full Tax Situation Explained "Explain my tax situation in plain language. I earn $72,000 W-2 and $18,000 freelance in California. Break down what I owe, why, and the most important decisions before April 15." → Routes to: strategy_advisor + tax_researcher
Business Deduction Audit "I run a sole proprietorship consultancy earning $140,000 with expenses: travel, home office, subcontractors, software. Identify every legitimate deduction, what documentation I need, and the most commonly missed ones." → Routes to: deduction_scanner (primary) + tax_researcher
Year-End Tax Reduction Sprint "It is October. My estimated tax liability is $22,000. What are the most impactful legal moves I can make before December 31 to reduce what I owe? I'm self-employed, contribute to a SEP-IRA, and have $30k in unrealized stock losses." → Routes to: strategy_advisor + deduction_scanner
Self-Employment Tax Explainer "I'm a freelance developer earning $95,000/year in Texas. Explain exactly how SE tax works, what quarterly payments I should be making, how to calculate them, and the top strategies to legally reduce my SE tax burden." → Routes to: tax_researcher + strategy_advisor
IRS Notice Response "I received a CP2000 notice proposing I owe an additional $4,200 due to unreported income from a 1099 I thought my employer handled. Explain what this means, whether I should be concerned, my response options, and a step-by-step resolution plan." → Routes to: notice_responder + tax_researcher (dual-pass)
12-Month Tax Strategy "I'm single, earning $110,000 W-2 plus $25,000 from rental income in New York. Goals: grow retirement savings, possibly start an LLC next year. Build me a 12-month tax strategy that minimizes what I owe and maximizes what I keep." → Routes to: strategy_advisor (primary) + all specialists
Quarterly Estimated Tax Calculator (bonus) "Walk me through calculating my Q3 estimated payment. I've earned $67,000 so far this year self-employed, paid $8,500 in estimated taxes, and expect another $25,000 in Q3." → Routes to: tax_researcher + calculator tool
Deduction Documentation Checklist (bonus) "I plan to claim home office, vehicle (actual expense method), and professional development deductions. What exact documentation do I need to maintain for each to survive an audit?" → Routes to: deduction_scanner + tax_researcher
Entity Structure Decision Support (bonus — MODEL_STRONG, always gated) "I'm currently a sole proprietor earning $180,000. Should I consider an S-corp election? Walk me through the tax math, the tradeoffs, and what I'd need to do." → Routes to: strategy_advisor | REVIEW_MODE=strict enforced | CPA recommendation mandatory

12. Risks & Failure Modes

Risk	Severity	Mitigation
Hallucinated deduction (agent invents nonexistent IRS rule)	Critical	Citation required for every claim; `hallucination_flag` triggers skill purge
Stale tax law (brackets/limits change annually)	High	Annual refresh task; verified_facts carry `last_reviewed` date; staleness alert
User acts on advice without CPA review	High	Every output includes disclosure; strategy/notice outputs always gated
PII leaking to Git history	High	`GIT_COMMIT_PII=false` default; `user_profile/situation.md` in `.gitignore`
Over-confident notice response	High	Dual-pass architecture; always recommends professional response
Model produces confident wrong math	Medium	Calculator tool used for all arithmetic; never trust LLM arithmetic alone
Hypothesis promoted prematurely	Medium	3-confirmation threshold + citation required; orchestrator enforces gate
User's state tax rules ignored	Medium	`USER_STATE` env var loads jurisdiction overlay; flagged if missing
Skill churn / thrashing	Low	Skill budget + pruning policy; churn rate logged in observability
Context window overflow for complex situations	Low	Specialist agents isolate concerns; session_segment.md scoped per task

Limitations to disclose to users explicitly:

AccountantOS is not a licensed CPA. All outputs are educational guidance.
Tax law changes. Always verify against current IRS publications before acting.
State tax rules vary significantly. State guidance requires verification.
For amounts over $1,000 in tax impact, professional review is strongly recommended.

13. v1 Scope — Ship in 2 Weeks

Goal: A working single-user local runtime that handles the 5 highest-frequency tasks.

Include in v1:

Docker Compose setup with orchestrator + 2 specialists (deduction_scanner, tax_researcher)
User onboarding flow: situation.md populated from 5 questions
Task types 1, 2, 3, 5 (deductions, situation explanation, SE tax)
Verified facts for current year: brackets, standard deduction, SE tax rate, IRA limits, QBI
Citation requirement enforced (hard block if missing)
Plain-language output formatter
Git commit on every +1 reward
REVIEW_MODE=strict active by default
[REQUIRES HUMAN REVIEW] disclosure on all strategy outputs

Defer from v1:

IRS notice responder (complex, needs dual-pass — v2)
Year-round 12-month strategy (needs full specialist roster — v2)
Hypothesis graduation automation (manual review in v1 is safer)
State jurisdiction overlays beyond 3–4 common states
Entity structure advice (S-corp election complexity — v2)

Success signal for v1: A real user with a real freelance tax situation runs tasks 1 and 5, rates clarity ≥ 4/5, and identifies at least one deduction they weren't aware of — with a valid IRS citation.

14. v2 Expansion Plan

Activate after: 20+ real sessions, reward logs reviewed, skill churn stable.

Feature	Rationale
NoticeResponder agent	High user anxiety, high value — add after v1 proves routing stability
12-month strategy synthesis	Requires all 4 specialists working reliably — enable after individual specialist quality is confirmed
State jurisdiction overlays	Expand from 3 states to full 50 based on `USER_STATE` usage logs
Hypothesis graduation automation	Safe once 3-confirmation threshold is battle-tested in v1
Annual facts refresh scheduler	Cron job that re-verifies `verified_facts/` against IRS publications each January
Multi-user support	Separate `user_profile/` per user ID; shared `tax_knowledge/` read-only to all
Deduction documentation generator	Output a formatted checklist PDF per claimed deduction
Quarterly tax calendar	Proactive reminders tied to `USER_STATE` and filing status
S-corp / entity advice module	Highest-stakes task class — needs its own specialist, strong model mandatory, extra review gate

AccountantOS — built on Git, Docker, and the honest self-assessment that most tax advice agents need fewer opinions and more citations.

MuhammadYossry/AccountantAgentOS_ex.md

Select an option

No results found