Claude Code mail triage — Gmail + enterprise Outlook, driven from your phone

A practical reference for wiring up end-to-end mail triage in Claude Code. What you get at the end:

One natural-language command ("process my mail" / "24 小時 actionable") triages Gmail personal + Gmail work + enterprise Outlook in a single pass.
Enterprise Outlook works even if your tenant has disabled OAuth consent (AADSTS65002 / need admin approval) — scraping OWA via a stealth Chromium session instead of IMAP/Graph.
Trigger it from your phone via a Telegram bridge. Claude runs headless in tmux on your Mac; you text from bed, a prioritized digest comes back.
Read-only Outlook tenant is still actionable — instead of moving items, Claude creates macOS Reminders and keeps an append-only action log as the only source of truth.

Who this is for. You use Claude Code as your daily driver, you're comfortable with shell + Python, and at least one of your accounts is on a Microsoft tenant where IT has disabled user consent to third-party apps. Thunderbird, mbsync, oauth2ms, the Graph CLI all die — you're stuck with the OWA web UI unless you scrape it.

If your tenant is well-behaved, ignore the Outlook half and just keep the Gmail + Claude Code skill layout + Telegram bridge. Still a big win.

Stack

Account type	Access	Tool
Gmail personal / work	IMAP + app password	`himalaya` CLI
Enterprise Outlook 365 (OAuth-blocked)	OWA scraping → local Maildir	Python + patchright
Triage brain	Haiku classifier	Claude Code's `Agent` tool
Action sink (read-only accounts)	macOS Reminders	`osascript`
Orchestration	Claude Code skills	`.claude/skills/<name>/SKILL.md`

Everything runs locally. Only outbound: mail servers + Claude API for classification.

Quick start

# 1. himalaya (Gmail)
brew install himalaya        # or cargo install himalaya

# 2. uv + patchright (Outlook scraping)
brew install uv
# in your outlook-sync project dir:
uv add patchright httpx
uv run patchright install chromium

# 3. Gmail app passwords — enable 2FA, then generate at:
#    https://myaccount.google.com/apppasswords

# 4. Claude Code
#    https://claude.com/claude-code

Minimal himalaya config (~/Library/Application Support/himalaya/config.toml or ~/.config/himalaya/config.toml):

[accounts.Personal]
default = true
email = "you@gmail.com"
# Gmail uses Chinese-localized folder paths if your UI is in Chinese;
# use "[Gmail]/Sent Mail" if English, "[Gmail]/寄件備份" if 中文, etc.
folder.aliases.inbox = "INBOX"
folder.aliases.sent = "[Gmail]/Sent Mail"
folder.aliases.drafts = "[Gmail]/Drafts"
folder.aliases.trash = "[Gmail]/Trash"
backend.type = "imap"
backend.host = "imap.gmail.com"
backend.port = 993
backend.encryption.type = "tls"
backend.auth.type = "password"
backend.auth.raw = "<16-char app password>"
message.send.backend.type = "smtp"
message.send.backend.host = "smtp.gmail.com"
message.send.backend.port = 465

Add a [accounts.Work] block for a second Gmail.

Claude Code integration

The entire triage workflow runs as natural-language commands to Claude Code ("process my mail", "triage inbox", "3 信箱 24 小時 actionable"). Claude loads skills, reads the shared rule file, dispatches Haiku subagents in parallel, and reports back.

Directory layout

~/Mail/                                     # your workspace root
├── CLAUDE.md                               # session primer — loaded on every turn
├── .claude/
│   ├── rules/
│   │   └── filing-rules.md                 # SHARED policy — read by both skills
│   └── skills/
│       ├── himalaya/
│       │   ├── SKILL.md                    # Gmail mechanics
│       │   └── README.md                   # user-facing docs
│       └── outlook/
│           └── SKILL.md                    # Outlook/OWA mechanics
├── logs/
│   └── mail/
│       ├── actions.jsonl                   # append-only action log (state)
│       └── digest-YYYY-MM-DD-HHMM.md       # per-session human-readable summary
├── outlook-sync/                           # the patchright pipeline
│   ├── login.py
│   ├── sync.py
│   ├── owa_sync/
│   │   ├── config.py
│   │   ├── session.py
│   │   └── ...
│   ├── launchd/                            # com.you.outlook-sync.plist
│   └── state/                              # session.json, sync_state.json (gitignored)
└── maildir/
    └── outlook/INBOX/{cur,new,tmp}/        # your Outlook mirror

CLAUDE.md primer

This file is loaded automatically by Claude Code at the start of each session. Keep it small — it's not the rule file, it's the index:

# Mail workspace

Three accounts, two skills, one playbook.

| Label | Address | Skill |
|---|---|---|
| Personal | you@gmail.com | `himalaya` |
| Work | you@work.com | `himalaya` |
| Corp (OAuth-blocked) | you@corp.com | `outlook` (read-only) |

**Canonical docs — load on demand:**
- Shared filing + cross-account triage flow: `.claude/rules/filing-rules.md`
- Gmail mechanics: `.claude/skills/himalaya/SKILL.md`
- Outlook mechanics: `.claude/skills/outlook/SKILL.md`

**Triggers**: "process mail" / "triage" / "清信箱" → shared flow
(all three accounts, Haiku classifies, unified digest, apply on Gmail, Reminders on Outlook).

**Logs**: `~/Mail/logs/mail/` (git-ignored). Always dedup against `actions.jsonl` before re-classifying.

The "skill" is just a markdown file Claude reads

A SKILL.md is a reference doc Claude loads when the topic matches. It's not code. Example shape for .claude/skills/himalaya/SKILL.md:

# himalaya — Gmail operations

## Common commands
- List: `himalaya envelope list --account Personal --page-size 200 after 2026-01-01`
- Read: `himalaya message read <id> --account Personal`
- Move: `himalaya message move "@Reference" <id> --account Personal`

## Gotchas
- `-a` flag does not exist → always `--account <name>`.
- Filter args are positional: `"after" "2026-01-01"`, NOT `"after 2026-01-01"`.
- HTML-only bodies render blank via `message read`. Fall back to Python `imaplib`.
- Batch IDs need `eval` to re-split whitespace.

## Triage workflow
Read `../../rules/filing-rules.md` for the shared playbook. Gmail-specific
phase-3 mechanics: `himalaya message move "@Reference"` for not_actionable,
keep INBOX for actionable.

Same shape for outlook/SKILL.md but with Maildir parsing patterns.

Parallel Haiku classification via the Agent tool

For Phase-2 (soft classification) Claude batches messages ~30 per subagent and dispatches in parallel. In Claude Code this is the Agent tool with subagent_type=general-purpose and model=haiku:

Dispatch in one message with multiple Agent tool calls:

Agent(model=haiku, prompt="For each row in /tmp/batches/p_00 (TSV: ID\tFROM\tSUBJECT):
  1. Run `himalaya message read <ID> --account Personal 2>&1 | head -60`
  2. Classify: actionable | not_actionable | uncertain
  3. Output to /tmp/batches/p_00.out: <ID>\t<verdict>\t<reason>")
Agent(model=haiku, prompt="same for /tmp/batches/p_01 ...")
Agent(model=haiku, prompt="same for /tmp/batches/p_02 ...")

Then aggregate the .out files, batch-move not_actionable IDs on Gmail, Reminders for Outlook actionable, write digest + append to actions.jsonl.

Remote control: Telegram + tmux

The nicest part of this whole setup: you can trigger triage from your phone. Claude Code runs headless in a tmux session on your Mac, a small Telegram bot forwards your messages into the conversation, Claude replies back through the bot.

iPhone → Telegram → your bot → MCP server → Claude Code (in tmux) → back out

Why tmux

Claude Code stays alive across SSH disconnects, closed terminals, laptop lid close.
One long-running session keeps context cache warm — each new turn is cheap.
You can tmux attach from any shell to see what's happening, Ctrl-b d to detach.

tmux new -s mail
# inside tmux:
cd ~/Mail && claude
# detach: Ctrl-b d
# reattach later: tmux attach -t mail
# check from outside: tmux capture-pane -p -t mail

The Telegram bridge

A minimal MCP server exposes a send tool to Claude and forwards incoming Telegram messages as <channel> tags in the conversation:

// ~/.mcp.json or project .mcp.json
{
  "mcpServers": {
    "telegram": {
      "command": "bun",
      "args": ["run", "/path/to/telegram-channel/src/server.ts"]
    }
  }
}

The server (bun/TypeScript or whatever you prefer):

Polls Telegram getUpdates with long-poll + persistent last_update_id.txt.
On new message, emits a conversation turn wrapped in a tag like <channel source="telegram" chat_id="..." from="...">message text</channel>.
Exposes an MCP send(chat_id, text) tool so Claude can reply.

In CLAUDE.md teach Claude how to reply:

## Telegram channel

Messages arriving as `<channel source="telegram" chat_id="..." from="...">`
come from me over Telegram. Reply with the `send` MCP tool, passing through
the `chat_id` attribute. Plain text only — Telegram renders as plain text.

Now you can text 三信箱 24 小時內 actionable 的信 from bed and get back a prioritized digest while the Mac runs headless.

Auth + safety

Create a bot with @BotFather, get the token.
Check from_id in incoming messages against an allowlist — only accept your own Telegram user id. Otherwise anyone who discovers your bot can issue commands.
Keep the bot token in .env, gitignored.
For permission prompts, Claude Code's harness can be configured to auto-approve read-only tool calls so you don't have to confirm from the phone; see .claude/settings.local.json.

Why patchright for Outlook

OWA's web UI gets Bearer JWTs handed to it by the tenant's auth flow, and those tokens ARE valid for outlook.office.com/owa/service.svc — the same endpoint OWA JavaScript uses internally. Capture the Bearer and cookies a real browser obtains, replay with your own HTTP client.

patchright is a stealth Playwright fork, less likely to be flagged by Microsoft's Conditional Access device-fingerprint heuristics than vanilla Playwright.

The login flow:

patchright.chromium.launch_persistent_context(user_data_dir=...) — persistent profile keeps ESTSAUTH cookies across runs (~30 days).
Navigate to https://outlook.office.com/mail/, user completes MFA once.
Hook page.on("request") and capture the first 5 authenticated POSTs to service.svc (they'll include Authorization: Bearer <JWT> and all required headers).
Save cookies + template headers + template bodies to state/session.json (chmod 600).

After that you can POST to service.svc?action=<Action> from Python directly.

Working actions: FindConversation, SyncFolderItems, GetItem (with IncludeMimeContent: true → full RFC822 base64), GetFolder, GetTimeZone.

Broken with obvious bodies: FindItem (→ MemberAccessException), SyncFolderHierarchy (404 — try FindFolder instead).

Quirk: OWA sends the JSON body URL-encoded inside an x-owa-urlpostdata header, POST body is empty. SyncFolderItems.Changes is a flat list of {ChangeType: "Create"|"Update"|"Delete"|..., Item|ItemId: ...}, not separate arrays.

Session lifecycle — drive freshness from JWT exp

The token has an exp claim — parse it, don't guess. The default "refresh every N minutes" heuristic either refreshes too often (if tokens live longer than you think) or 401s mid-sync (if they live shorter).

import base64, json, time

def parse_jwt_exp(token: str) -> float | None:
    if token.lower().startswith("bearer "):
        token = token[7:]
    parts = token.split(".")
    if len(parts) < 2:
        return None
    payload_b64 = parts[1] + "=" * (-len(parts[1]) % 4)
    try:
        payload = json.loads(base64.urlsafe_b64decode(payload_b64))
    except (ValueError, json.JSONDecodeError):
        return None
    exp = payload.get("exp")
    return float(exp) if isinstance(exp, (int, float)) else None

def is_session_fresh(bearer: str, safety_sec: int = 300) -> bool:
    exp = parse_jwt_exp(bearer)
    if exp is None:
        return False
    return (exp - time.time()) > safety_sec

Refresh ladder:

JWT fresh → reuse, no browser.
JWT near expiry → headless Chromium with persisted profile → OWA JS silently refreshes Bearer. ~5s, no MFA.
Silent refresh fails (ESTSAUTH expired, CAF challenge) → headed browser, user completes MFA. Rare.
Any 401 mid-sync → force relogin once, retry.

Real-world reuse window: ~23 hours per token in our tenant. YMMV by tenant policy.

Storage: Maildir, not live IMAP

Outlook items land as .eml files in a Maildir tree. This means:

Any Maildir tool works: mu, notmuch, neomutt, or just email.parser in Python.
mu index --maildir=~/Mail/maildir/outlook gives you full-text search.
Offline-readable. Offline-searchable.
Read-only from our side — we don't have the OWA HTTP surface for moves/flags mapped yet, so "filing" for Outlook happens via the log + Reminders, not via server-side state changes.

launchd agent runs sync.py every 15 minutes. sync.py calls SyncFolderItems with the opaque SyncState pointer, downloads new items via GetItem, writes them atomically to Maildir.

Drift detection: compare GetFolder.TotalCount vs local index size. If they diverge by > 3, reset SyncState to "" and re-walk. An idempotent item index makes re-walking cheap (already-downloaded items skip the GetItem).

Triage playbook

One shared rule file (.claude/rules/filing-rules.md) drives both skills. Three phases:

Phase 1 — hard rules (auto-file, no confirm)

Kept deliberately narrow. Only "100% impossible to contain actionable" categories.

Good hard-rule candidates:

GitHub bot notifications matching token expiring / codespaces deleted
E-invoices (sender pattern or subject keyword)
Bank / billing statements from known senders

Not hard rules (soft flow instead):

Society / academic broadcasts (may contain CFPs or deadlines)
Hospital / company internal systems
Security alerts (occasionally real)
Real-human GitHub PR / issue comments

Adding a hard rule requires "no actionable message from this pattern in the past 3 months." One false positive → demote.

Phase 2 — Haiku classification

Remaining mail → batches of ~30 → parallel Haiku agents. Per-message output:

<id> | <from> | <subject>
→ summary: one line
→ verdict: actionable | not_actionable | uncertain
→ suggested: @Action | @Waiting | @Reference | @Someday | keep
→ reason: short

Domain defaults (example — customize for your life):

Source	Default
Research peers / real collaborators	actionable
Speaking / meeting invitations	actionable
Account-state notifications	not_actionable, unless unknown-device login
Employer internal broadcasts	keep INBOX, never file
First-time unknown real human	keep INBOX, surface at top of digest

Phase 3 — apply

Gmail (writable): himalaya message move "@Reference" <id> for not_actionable; keep INBOX for actionable.

Outlook (read-only):

Verdict	Action
not_actionable	append `digest_only` to log, collapse in digest
actionable	create Reminder via `osascript`, append `reminder_created`
uncertain	digest-only, flagged as `待判`
hands-off	explicit `noop_hands_off` log entry

Log-as-state (critical for read-only accounts)

Outlook items can't be moved out of INBOX, so without dedup every triage run re-classifies every message forever.

Solution: append-only actions.jsonl, one line per action, keyed by Message-Id:

{"ts":"2026-04-22T02:00:00","account":"outlook","id":"<Message-Id>","from":"...","subject":"...","action":"digest_only","verdict":"not_actionable","reason":"broadcast","session":"..."}
{"ts":"2026-04-22T02:00:00","account":"outlook","id":"<Message-Id>","from":"...","subject":"...","action":"reminder_created","reminder_id":"..."}

On every triage run: read log → set of seen Message-Ids → skip. This makes the log the source of truth for Outlook triage state, the audit trail for "what did Claude do?", and the input to an "undo last session" operation.

Staleness guard (trust launchd, verify before trusting)

launchd runs the Outlook sync every 15 min. Normally Maildir is current. But launchd can fall behind — laptop slept, agent crashed. In the triage flow, check logs/sync.log mtime before trusting the mirror:

import subprocess, time
from pathlib import Path

SYNC_LOG = Path("~/Mail/outlook-sync/logs/sync.log").expanduser()
STALE_SEC = 30 * 60  # 2× launchd cadence

if not SYNC_LOG.exists() or (time.time() - SYNC_LOG.stat().st_mtime) > STALE_SEC:
    subprocess.run(["uv", "run", "sync.py"],
                   cwd="~/Mail/outlook-sync", check=True)

Don't call sync.py unconditionally — with a warm JWT it's 0.3s, but you still pay the process spin-up.

What to steal, what to adapt

Steal:

OWA scraping architecture (if your tenant is OAuth-blocked).
JWT exp as freshness source.
Maildir mirror + log-as-state for any read-only mail source.
Hard rules narrow, soft rules permissive — a soft rule costs fractions of a cent per message via Haiku, an accidentally auto-filed deadline email costs real.
Parallel subagent dispatch for classification (batches of ~30).
The CLAUDE.md + .claude/skills/<name>/SKILL.md + .claude/rules/<shared>.md layout.

Adapt:

Your Haiku defaults table — encodes who matters to you.
Hard rule patterns — function of your billers, vendors, bots.
Whether GTD taxonomy (@Action/@Waiting/@Reference/@Someday) is even right for you.
Your launchd / cron cadence (15 min works for us; pick yours).

Don't copy: literal institution / people names, or anyone else's "who matters to me" list.

References

himalaya — Rust CLI for IMAP / SMTP, handles the Gmail side.
patchright — stealth Playwright fork for the OWA scrape.
Claude Code — CLI + skills system.
Claude Code skills docs — how SKILL.md / CLAUDE.md / subagents work.
Gmail app passwords — requires 2FA.
OWA service.svc has no public documentation. Capture real requests from DevTools → Network tab, inspect, replay. That's your spec.

Legal / ethical note

This technique is only appropriate for mailboxes you personally own on a tenant where you're authenticated. It's not for accessing other people's mail or bypassing legitimate security controls. If your employer's acceptable-use policy forbids programmatic access to corporate mail, don't do this — get them to approve an official client.

htlin222/mail-triage-gist.md

Select an option

No results found