Design philosophy: Useful over impressive. Inspectable over magical. Narrow scope. Tool-backed evidence. Human in the loop for every consequential output.
AGENT_NAME: AccountantOS
PRIMARY_PURPOSE: Personal tax optimization, deduction discovery, IRS guidance,
and year-round accounting strategy — explained in plain language
DOMAIN: Personal finance, US tax law, self-employment accounting,
small business deductions
TARGET_USERS: Individuals — salaried employees, freelancers, small business
owners, gig workers, rental income earners
RISK_LEVEL: high
AUTONOMY_LEVEL: assistant
# Never semi-autonomous or autonomous for financial output.
# Always surfaces reasoning; never files, pays, or contacts IRS directly.
TOOLS_ALLOWED: tax_bracket_lookup, ira_contribution_limit_lookup,
web_search (IRS publications only), calculator, date_math,
form_reference_lookup, filing_deadline_lookup
TOOLS_FORBIDDEN: direct_bank_access, payment_processing, e-filing,
open_web_search (general), code_execution (user data)
DATA_SOURCES: IRS.gov publications, user-provided income/expense data,
state tax authority pages, verified tax law references
SUCCESS_METRICS: deductions_surfaced_per_session, user_clarity_score (self-rated),
plan_completeness_score, hallucination_incidents (target: 0),
human_review_pass_rate, tasks_requiring_no_followup
DEPLOYMENT_ENV: local laptop or server (Docker sandbox, no cloud required)
BUDGET_PRIORITY: balanced
# Cheap model for routine lookups; stronger model for
# IRS notice analysis, strategy synthesis, edge cases.
# Model choice left entirely to the operator — see Section 7.
PRIVACY_REQUIREMENTS: critical
# All user financial data stays local. No external logging.
# PII never leaves the container. No third-party telemetry.
HUMAN_REVIEW_POINTS: IRS notice response drafts, year-end strategy recommendations,
any output the user intends to act on financially
MULTI_AGENT_REQUIRED: yes
# Specialist sub-agents: TaxResearcher, DeductionScanner,
# StrategyAdvisor, NoticeResponder. See Section 3.
LONG_TERM_MEMORY: yes
# Retains user financial profile, past session summaries,
# and confirmed deduction history across sessions.
LEARNING_ALLOWED: constrained
# Skill learning only from verified, tool-backed outcomes.
# World-model updates require IRS publication citation.
# No learning from user-provided claims alone.
OUTPUT_STYLE: plain language explanations + structured action lists +
document templates where relevant. No jargon without definition.AccountantOS is a personal tax intelligence runtime that helps individuals understand their full tax picture, surface overlooked deductions, respond to IRS notices with confidence, and build a year-round strategy — all in plain language.
Why this architecture fits:
-
Tax guidance is high-stakes and high-hallucination-risk. Every output must be traceable to a specific IRS publication or calculation. The Unix filesystem model enforces this by making
verified_facts/andhypotheses/structurally separate — an unverified claim never migrates to confirmed policy without tool-backed evidence. -
Users arrive with wildly different situations (W-2, 1099, S-corp, rental income, RSUs). A skill-learning agent that earns and prunes capabilities based on actual task outcomes becomes more precise over time rather than drifting toward generic advice.
-
Financial data is maximally sensitive. Docker isolation with no egress except explicitly whitelisted IRS domains keeps PII local and auditable.
-
The multi-agent design (Section 3) separates research from strategy from user communication — preventing the single-agent failure mode where one context window tries to be a researcher, advisor, and writer simultaneously.
Choice: Orchestrator + Specialist Workers
[User Task]
│
▼
┌─────────────────────┐
│ Orchestrator │ routes tasks, holds session context,
│ (AccountantOS) │ enforces review gates, merges outputs
└──────┬──────────────┘
│
┌────┴─────────────────────────────────┐
│ │ │ │
▼ ▼ ▼ ▼
[TaxResearcher] [DeductionScanner] [StrategyAdvisor] [NoticeResponder]
IRS pub income/expense year-round plan IRS letter
lookups pattern matching synthesis analysis
Why not a single container? Each specialist has a tightly scoped system prompt, a dedicated skill file, and its own reward log. This prevents context pollution (deduction lists bleeding into IRS notice tone), enables independent skill pruning, and makes observability tractable — you can inspect exactly which agent produced a given output.
Why not full microservices? This runs on a local laptop or small server. Agents communicate via handoff files (Section 3), not HTTP. No Kubernetes. No service mesh. Just Docker Compose + shared volume mounts.
accountantos/
├── Dockerfile
├── docker-compose.yml
├── agents.yaml # agent roster, model hints, tool grants
├── env.example # MODEL_NAME, LOG_LEVEL, REVIEW_MODE
├── main.py # task router and session loop
├── requirements.txt
│
└── /data/ # Docker volume mount — never leaves host
│
├── .git/ # longitudinal memory (Section 4)
│
├── shared/ # inter-agent communication
│ ├── inbox/ # orchestrator inbound queue
│ ├── handoffs/ # specialist-to-specialist results
│ ├── locks/ # write semaphores
│ └── session_segment.md # live shared working context
│
├── agents/
│ │
│ ├── orchestrator/
│ │ ├── persona.md # read-only: routing + escalation identity
│ │ ├── constraints.md # read-only: budget, risk floor, forbidden actions
│ │ ├── skills.md # earned routing heuristics (max 15 entries)
│ │ ├── goals.md # recurring routing failures to fix
│ │ └── rewards.md # last 30 task outcomes
│ │
│ ├── tax_researcher/
│ │ ├── persona.md # read-only: IRS publication specialist
│ │ ├── constraints.md # read-only: citations required, no speculation
│ │ ├── skills.md # earned lookup patterns (max 20)
│ │ ├── goals.md
│ │ ├── rewards.md
│ │ └── reflections.md
│ │
│ ├── deduction_scanner/
│ │ ├── persona.md # read-only: pattern-match income/expense → deductions
│ │ ├── constraints.md # read-only: must cite IRS pub for every deduction
│ │ ├── skills.md # earned deduction patterns by filer type (max 20)
│ │ ├── goals.md
│ │ ├── rewards.md
│ │ └── reflections.md
│ │
│ ├── strategy_advisor/
│ │ ├── persona.md # read-only: year-round tax strategy synthesizer
│ │ ├── constraints.md # read-only: strategy = options + tradeoffs, not mandates
│ │ ├── skills.md # earned strategy templates (max 20)
│ │ ├── goals.md
│ │ ├── rewards.md
│ │ └── reflections.md
│ │
│ └── notice_responder/
│ ├── persona.md # read-only: IRS notice interpreter + response drafter
│ ├── constraints.md # read-only: always recommend professional review
│ ├── skills.md # earned notice-type handling patterns (max 15)
│ ├── goals.md
│ ├── rewards.md
│ └── reflections.md
│
├── user_profile/ # persistent user financial context
│ ├── situation.md # filing status, income sources, state
│ ├── expense_categories.md # documented expense patterns
│ ├── deduction_history.md # confirmed deductions from past sessions
│ └── preferences.md # communication style, detail level
│
├── tax_knowledge/ # the agent's epistemic layer
│ ├── index.md
│ ├── verified_facts/ # IRS-publication-backed, tool-verified
│ │ ├── brackets_2024.md
│ │ ├── contribution_limits.md
│ │ ├── se_tax_rates.md
│ │ ├── standard_deduction.md
│ │ └── common_deductions/
│ │ ├── home_office.md
│ │ ├── vehicle.md
│ │ ├── qbi_deduction.md
│ │ └── ...
│ ├── hypotheses/ # patterns with <3 confirmed task wins
│ │ └── README.md # "graduate to verified_facts/ only after
│ │ # 3 tool-verified +1 outcomes"
│ └── jurisdiction_notes/ # state-specific overlays (CA, NY, TX, etc.)
│
└── outputs/ # session deliverables for user review
├── deduction_reports/
├── strategy_plans/
├── notice_responses/
└── year_end_checklists/
Design decisions:
-
tax_knowledge/is split intoverified_facts/andhypotheses/— this is the single most important structural choice. A fact inverified_facts/must have a frontmattercitation:field pointing to an IRS publication and a tool-verified outcome count ≥ 3. No agent may treat a hypothesis as a fact. -
user_profile/persists across sessions. The user answers onboarding questions once; subsequent sessions load context from these files rather than re-asking. -
outputs/is the only directory the user interacts with directly. Everything else is agent-internal.
The following memory types are active in AccountantOS, each justified by domain need:
| Memory Type | File(s) | Justification |
|---|---|---|
| Skills | agents/*/skills.md |
Earned routing and pattern-match heuristics — e.g., "freelance + home office → always check QBI first" |
| Procedures | tax_knowledge/verified_facts/ |
Step-by-step calculation procedures (SE tax, QBI, depreciation) that must be consistent across sessions |
| Verified Facts | tax_knowledge/verified_facts/ |
IRS-publication-backed tax law — the ground truth layer |
| Hypotheses | tax_knowledge/hypotheses/ |
Observed patterns not yet verified 3× — quarantined from user output until confirmed |
| User Preferences | user_profile/preferences.md |
Communication style, preferred detail level, past questions |
| User Situation | user_profile/situation.md |
Filing status, income type, state — eliminates re-onboarding |
| Deduction History | user_profile/deduction_history.md |
Which deductions the user has already claimed or explored — prevents redundant advice |
| Policies | agents/*/constraints.md |
Hard rules: always cite sources, never file directly, always recommend CPA review for >$1k decisions |
Explicitly excluded:
- Cases — individual case memory would balloon with PII. Session summaries (non-PII, outcome-focused) are stored in Git commits instead (Section 5).
- Templates — IRS forms are referenced by number, not stored locally. Output
templates live in
outputs/but are generated, not memorized.
Tax advice without verification is liability. AccountantOS applies layered checks:
Every deduction claim must pass through tax_bracket_lookup or form_reference_lookup
before being surfaced to the user. The agent is instructed: "If you cannot verify
this with a tool call, classify it as a hypothesis, not a fact."
Every item in verified_facts/ carries frontmatter:
---
type: verified_fact
citation: "IRS Publication 587, Business Use of Your Home"
pub_url: "https://www.irs.gov/pub/irs-pdf/p587.pdf"
verified_date: 2024-11-01
tool_verifications: 4
last_reviewed: 2025-01-15
---An agent that references a fact without a citation field is blocked by the orchestrator's constraint check before output is returned.
Any pattern observed fewer than 3 times lives in hypotheses/. These are never
surfaced directly to the user — they may inform research direction but not advice.
The following output types are always flagged for human review before delivery:
- IRS notice response drafts
- Year-end strategy plans involving amounts > $500
- Any recommendation involving an entity structure change (LLC, S-corp)
- Amended return guidance
The gate surfaces a [REQUIRES REVIEW] header and a plain-language explanation of
what the user should verify with a CPA before acting.
IRS notice responses go through two agents: notice_responder drafts, then
tax_researcher cross-checks the cited regulation. Discrepancies block the output.
A refresh_task is scheduled annually (configurable) to re-verify every entry in
verified_facts/ against current IRS publications. Stale facts are demoted to
hypotheses/ until re-verified.
What is learned:
- Routing heuristics (orchestrator skills): which specialist handles which task pattern
- Deduction pattern skills: "freelancer + vehicle + mileage log present → deduction likely valid"
- Notice-type classification skills: CP2000 vs CP501 handling patterns
What is never learned from user data alone:
- Tax law facts (these require IRS publication verification)
- Calculation procedures (these are locked in
verified_facts/by a human or tool) - Any claim the user makes about their situation (recorded in
user_profile/but never promoted to the knowledge base)
Reward schema:
reward_decomposition:
accuracy: +1 / 0 / -1 # was the output factually correct?
completeness: +1 / 0 / -1 # did it cover the user's full situation?
clarity: +1 / 0 / -1 # user self-rates: did they understand it?
actionability: +1 / 0 / -1 # did it produce something the user could act on?
context_tags: [freelance, home_office, irs_notice, retirement, se_tax, ...]
citation_present: true/false # hard requirement
hallucination_flag: true/false # logged permanently; triggers skill reviewSkill budget: 20 per specialist agent. When full, the skill with the lowest
cumulative accuracy + completeness score is pruned. A skill with any
hallucination_flag: true is pruned immediately regardless of other scores.
Skill graduation from hypothesis:
A pattern in hypotheses/ graduates to verified_facts/ only when:
- It has earned
+1onaccuracyin at least 3 independent sessions, AND - A tool-backed citation has been attached.
No model is hardcoded. The operator sets
MODEL_FAST,MODEL_STANDARD, andMODEL_STRONGinenv.example. AccountantOS routes based on task class.
┌──────────────────────────────────────────────────────────────────────────┐
│ Task Class │ Routing Tier │ Rationale │
├──────────────────────────────────────────────────────────────────────────┤
│ IRS publication lookup │ MODEL_FAST │ Deterministic retrieval │
│ Tax bracket / limit lookup │ MODEL_FAST │ Pure calculation │
│ Deduction list for known type │ MODEL_FAST │ Pattern match, known │
│ Explaining known deductions │ MODEL_STANDARD │ Synthesis + plain lang │
│ Quarterly payment calculation │ MODEL_STANDARD │ Multi-step math │
│ Year-round strategy plan │ MODEL_STANDARD │ Multi-factor synthesis │
│ IRS notice analysis │ MODEL_STRONG │ High stakes, nuanced │
│ Novel situation (no prior +1) │ MODEL_STRONG │ Low confidence, flag │
│ Dual-pass notice verification │ MODEL_STRONG │ Accuracy critical │
│ Entity structure advice │ MODEL_STRONG │ Irreversible decisions │
└──────────────────────────────────────────────────────────────────────────┘
Configuration:
# env.example — operator sets these; no defaults hardcoded
MODEL_FAST= # e.g. a small/fast model of your choice
MODEL_STANDARD= # e.g. a mid-tier model of your choice
MODEL_STRONG= # e.g. a frontier model of your choice
MODEL_PROVIDER= # anthropic | openai | local | otherRouting logic in main.py:
def route_model(task_class: str, novelty_score: float) -> str:
if novelty_score > 0.7 or task_class in HIGH_STAKES_CLASSES:
return os.getenv("MODEL_STRONG")
if task_class in STANDARD_CLASSES:
return os.getenv("MODEL_STANDARD")
return os.getenv("MODEL_FAST")Risk level: HIGH | Privacy: CRITICAL
# docker-compose.yml network config
networks:
accountantos_net:
driver: bridge
internal: false # allow egress only to whitelist
egress_whitelist:
- "*.irs.gov"
- "*.treasury.gov"
- "*.ssa.gov" # for SE tax rate verification
# All other egress: BLOCKED- All user financial data lives exclusively under
/data/(Docker volume, host-local) - No cloud sync. No S3. No external logging services.
.gitignoreexcludesuser_profile/situation.mdfrom Git by default (user may opt in to include anonymized summaries)
# .env is never committed. env.example has no real values.
# API keys loaded via Docker secrets or env vars only:
docker run --env-file .env accountantos- User income, SSN fragments, account numbers: never logged, never committed to Git
- Session summaries in Git commits contain only outcome metadata:
task: deduction scan | filer_type: freelance | deductions_found: 7 | reward: +1 outputs/files are stored locally; user explicitly exports them
- Each specialist agent runs in its own container (see docker-compose.yml)
- No agent can write to another agent's private directory
- Lock files in
shared/locks/prevent concurrent writes to shared state constraints.mdfor every agent is mounted read-only
REVIEW_MODE=strict(default): all outputs > threshold go tooutputs/review/with a[REQUIRES HUMAN REVIEW]prefix before being shown to userREVIEW_MODE=relaxed: informational outputs shown immediately; strategy and notice outputs still gated
FROM python:3.12-slim
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Data volume — all user state lives here, never in the image
VOLUME ["/data"]
# Git init for longitudinal memory
RUN git config --global user.email "accountantos@local" && \
git config --global user.name "AccountantOS"
CMD ["python", "main.py"]version: "3.9"
services:
orchestrator:
build: .
container_name: accountantos_orchestrator
volumes:
- accountantos_data:/data
env_file: .env
environment:
- AGENT_ROLE=orchestrator
- REVIEW_MODE=${REVIEW_MODE:-strict}
networks:
- accountantos_net
stdin_open: true
tty: true
tax_researcher:
build: .
container_name: accountantos_researcher
volumes:
- accountantos_data:/data
env_file: .env
environment:
- AGENT_ROLE=tax_researcher
networks:
- accountantos_net
deduction_scanner:
build: .
container_name: accountantos_scanner
volumes:
- accountantos_data:/data
env_file: .env
environment:
- AGENT_ROLE=deduction_scanner
networks:
- accountantos_net
strategy_advisor:
build: .
container_name: accountantos_strategy
volumes:
- accountantos_data:/data
env_file: .env
environment:
- AGENT_ROLE=strategy_advisor
networks:
- accountantos_net
notice_responder:
build: .
container_name: accountantos_notices
volumes:
- accountantos_data:/data
env_file: .env
environment:
- AGENT_ROLE=notice_responder
networks:
- accountantos_net
volumes:
accountantos_data:
driver: local
networks:
accountantos_net:
driver: bridgeagents:
orchestrator:
description: "Routes tasks, enforces review gates, merges specialist outputs"
persona_file: agents/orchestrator/persona.md
constraints_file: agents/orchestrator/constraints.md
skill_budget: 15
model_tier: standard
tools: [date_math, filing_deadline_lookup]
tax_researcher:
description: "Looks up IRS publications, verifies facts, cites sources"
persona_file: agents/tax_researcher/persona.md
constraints_file: agents/tax_researcher/constraints.md
skill_budget: 20
model_tier: standard # strong for novel lookups via routing policy
tools: [web_search_irs, form_reference_lookup, tax_bracket_lookup]
deduction_scanner:
description: "Matches user income/expense profile to verified deductions"
persona_file: agents/deduction_scanner/persona.md
constraints_file: agents/deduction_scanner/constraints.md
skill_budget: 20
model_tier: standard
tools: [calculator, ira_contribution_limit_lookup]
strategy_advisor:
description: "Synthesizes year-round tax strategy from user situation"
persona_file: agents/strategy_advisor/persona.md
constraints_file: agents/strategy_advisor/constraints.md
skill_budget: 20
model_tier: strong
tools: [calculator, date_math, ira_contribution_limit_lookup]
notice_responder:
description: "Interprets IRS notices, drafts responses, flags urgency"
persona_file: agents/notice_responder/persona.md
constraints_file: agents/notice_responder/constraints.md
skill_budget: 15
model_tier: strong
tools: [form_reference_lookup, web_search_irs, date_math]# ── Model Configuration (operator sets these — no defaults) ──────────────────
MODEL_FAST=
MODEL_STANDARD=
MODEL_STRONG=
MODEL_PROVIDER= # anthropic | openai | ollama | other
API_KEY= # your provider API key
# ── Runtime Settings ─────────────────────────────────────────────────────────
REVIEW_MODE=strict # strict | relaxed
LOG_LEVEL=info # debug | info | warn | error
SKILL_BUDGET=20 # max skills per agent
REWARD_LOG_SIZE=30 # rolling reward window
HYPOTHESIS_THRESHOLD=3 # confirmations required to graduate a hypothesis
# ── Privacy ──────────────────────────────────────────────────────────────────
GIT_COMMIT_PII=false # never commit user_profile/situation.md to Git
TELEMETRY=false # no external telemetry, ever
# ── Tax Year ─────────────────────────────────────────────────────────────────
TAX_YEAR=2024
USER_STATE= # e.g. CA, NY, TX — for state overlay lookupsdocker compose up
│
▼
orchestrator starts
│
├── loads user_profile/situation.md (if exists)
├── loads session_segment.md
├── reads inbox/ for queued tasks
│
▼
user submits task (one of the 7 task types)
│
▼
orchestrator classifies task → routes to specialist
│
▼
specialist loads its skills.md + relevant verified_facts/
│
├── calls tools as needed (citation required)
├── writes result to shared/handoffs/
│
▼
orchestrator merges result
│
├── runs citation check
├── applies review gate if applicable
├── writes to outputs/ for user
│
▼
reward logged → Git commit on +1
Mission-specific metrics — not generic agent stats:
| Metric | How Tracked | Target |
|---|---|---|
deductions_surfaced_per_session |
Count in reward log | Trend up as skills mature |
deductions_with_citation_rate |
Citation present flag | 100% — hard requirement |
hallucination_incidents |
hallucination_flag in rewards |
0 |
user_clarity_score |
Self-rated 1–5 after each session | ≥ 4.0 avg |
hypothesis_graduation_rate |
Hypotheses promoted / total | Rising = knowledge maturing |
review_gate_rejection_rate |
Human review flags / total outputs | Falling = quality improving |
notice_response_approval_rate |
User-approved drafts / total | ≥ 90% |
model_tier_distribution |
Fast / Standard / Strong % per task class | Cost indicator |
skill_churn_rate |
Skills added vs pruned per 30 tasks | Falling = agent maturing |
stale_fact_count |
Verified facts past annual review date | 0 target |
Log format (append to rewards.md):
---
date: 2024-11-15
task_type: deduction_scan
filer_type: freelance
agent: deduction_scanner
citations_present: true
hallucination_flag: false
reward_decomposition:
accuracy: +1
completeness: +1
clarity: +1
actionability: +1
context_tags: [freelance, home_office, vehicle, qbi]
model_tier_used: standard
user_clarity_score: 5
---Git commit format:
scanner(learn): earn home-office-freelance pattern from 3rd +1 session
task: deduction scan | filer: freelance | deductions: 9
reward: +1 (accuracy+1, completeness+1, clarity+1, actionability+1)
skills-added: home-office-exclusive-use-check
skills-pruned: none
hypothesis-graduated: home_office_shared_space_disqualifier
These map directly to the seven use cases in your design goals:
-
Deduction Discovery (freelancer) "Act as a certified CPA. My situation: freelance designer, $85,000 annual income, expenses include home office, laptop, software subscriptions, professional development. Identify every deduction I likely qualify for that most people in my situation overlook." → Routes to: deduction_scanner + tax_researcher
-
Full Tax Situation Explained "Explain my tax situation in plain language. I earn $72,000 W-2 and $18,000 freelance in California. Break down what I owe, why, and the most important decisions before April 15." → Routes to: strategy_advisor + tax_researcher
-
Business Deduction Audit "I run a sole proprietorship consultancy earning $140,000 with expenses: travel, home office, subcontractors, software. Identify every legitimate deduction, what documentation I need, and the most commonly missed ones." → Routes to: deduction_scanner (primary) + tax_researcher
-
Year-End Tax Reduction Sprint "It is October. My estimated tax liability is $22,000. What are the most impactful legal moves I can make before December 31 to reduce what I owe? I'm self-employed, contribute to a SEP-IRA, and have $30k in unrealized stock losses." → Routes to: strategy_advisor + deduction_scanner
-
Self-Employment Tax Explainer "I'm a freelance developer earning $95,000/year in Texas. Explain exactly how SE tax works, what quarterly payments I should be making, how to calculate them, and the top strategies to legally reduce my SE tax burden." → Routes to: tax_researcher + strategy_advisor
-
IRS Notice Response "I received a CP2000 notice proposing I owe an additional $4,200 due to unreported income from a 1099 I thought my employer handled. Explain what this means, whether I should be concerned, my response options, and a step-by-step resolution plan." → Routes to: notice_responder + tax_researcher (dual-pass)
-
12-Month Tax Strategy "I'm single, earning $110,000 W-2 plus $25,000 from rental income in New York. Goals: grow retirement savings, possibly start an LLC next year. Build me a 12-month tax strategy that minimizes what I owe and maximizes what I keep." → Routes to: strategy_advisor (primary) + all specialists
-
Quarterly Estimated Tax Calculator (bonus) "Walk me through calculating my Q3 estimated payment. I've earned $67,000 so far this year self-employed, paid $8,500 in estimated taxes, and expect another $25,000 in Q3." → Routes to: tax_researcher + calculator tool
-
Deduction Documentation Checklist (bonus) "I plan to claim home office, vehicle (actual expense method), and professional development deductions. What exact documentation do I need to maintain for each to survive an audit?" → Routes to: deduction_scanner + tax_researcher
-
Entity Structure Decision Support (bonus — MODEL_STRONG, always gated) "I'm currently a sole proprietor earning $180,000. Should I consider an S-corp election? Walk me through the tax math, the tradeoffs, and what I'd need to do." → Routes to: strategy_advisor | REVIEW_MODE=strict enforced | CPA recommendation mandatory
| Risk | Severity | Mitigation |
|---|---|---|
| Hallucinated deduction (agent invents nonexistent IRS rule) | Critical | Citation required for every claim; hallucination_flag triggers skill purge |
| Stale tax law (brackets/limits change annually) | High | Annual refresh task; verified_facts carry last_reviewed date; staleness alert |
| User acts on advice without CPA review | High | Every output includes disclosure; strategy/notice outputs always gated |
| PII leaking to Git history | High | GIT_COMMIT_PII=false default; user_profile/situation.md in .gitignore |
| Over-confident notice response | High | Dual-pass architecture; always recommends professional response |
| Model produces confident wrong math | Medium | Calculator tool used for all arithmetic; never trust LLM arithmetic alone |
| Hypothesis promoted prematurely | Medium | 3-confirmation threshold + citation required; orchestrator enforces gate |
| User's state tax rules ignored | Medium | USER_STATE env var loads jurisdiction overlay; flagged if missing |
| Skill churn / thrashing | Low | Skill budget + pruning policy; churn rate logged in observability |
| Context window overflow for complex situations | Low | Specialist agents isolate concerns; session_segment.md scoped per task |
Limitations to disclose to users explicitly:
- AccountantOS is not a licensed CPA. All outputs are educational guidance.
- Tax law changes. Always verify against current IRS publications before acting.
- State tax rules vary significantly. State guidance requires verification.
- For amounts over $1,000 in tax impact, professional review is strongly recommended.
Goal: A working single-user local runtime that handles the 5 highest-frequency tasks.
Include in v1:
- Docker Compose setup with orchestrator + 2 specialists (deduction_scanner, tax_researcher)
- User onboarding flow: situation.md populated from 5 questions
- Task types 1, 2, 3, 5 (deductions, situation explanation, SE tax)
- Verified facts for current year: brackets, standard deduction, SE tax rate, IRA limits, QBI
- Citation requirement enforced (hard block if missing)
- Plain-language output formatter
- Git commit on every +1 reward
-
REVIEW_MODE=strictactive by default -
[REQUIRES HUMAN REVIEW]disclosure on all strategy outputs
Defer from v1:
- IRS notice responder (complex, needs dual-pass — v2)
- Year-round 12-month strategy (needs full specialist roster — v2)
- Hypothesis graduation automation (manual review in v1 is safer)
- State jurisdiction overlays beyond 3–4 common states
- Entity structure advice (S-corp election complexity — v2)
Success signal for v1: A real user with a real freelance tax situation runs tasks 1 and 5, rates clarity ≥ 4/5, and identifies at least one deduction they weren't aware of — with a valid IRS citation.
Activate after: 20+ real sessions, reward logs reviewed, skill churn stable.
| Feature | Rationale |
|---|---|
| NoticeResponder agent | High user anxiety, high value — add after v1 proves routing stability |
| 12-month strategy synthesis | Requires all 4 specialists working reliably — enable after individual specialist quality is confirmed |
| State jurisdiction overlays | Expand from 3 states to full 50 based on USER_STATE usage logs |
| Hypothesis graduation automation | Safe once 3-confirmation threshold is battle-tested in v1 |
| Annual facts refresh scheduler | Cron job that re-verifies verified_facts/ against IRS publications each January |
| Multi-user support | Separate user_profile/ per user ID; shared tax_knowledge/ read-only to all |
| Deduction documentation generator | Output a formatted checklist PDF per claimed deduction |
| Quarterly tax calendar | Proactive reminders tied to USER_STATE and filing status |
| S-corp / entity advice module | Highest-stakes task class — needs its own specialist, strong model mandatory, extra review gate |
AccountantOS — built on Git, Docker, and the honest self-assessment that most tax advice agents need fewer opinions and more citations.