Skip to content

Instantly share code, notes, and snippets.

@oneryalcin
Created April 15, 2026 08:32
Show Gist options
  • Select an option

  • Save oneryalcin/7e4ed1438bcfdd39e4192198598979ba to your computer and use it in GitHub Desktop.

Select an option

Save oneryalcin/7e4ed1438bcfdd39e4192198598979ba to your computer and use it in GitHub Desktop.
Hermes HRR memory
ok I'd liek to undertsand the memory thoroughly in hermes agent. Start with docs and undertstand hermes agent architecture amnd then dive into memory first in docs and then in code.
I'd like to deploy memeoy in my applicaytion and i tyhink hermes agent ha really interesting approach to memory
⏺ Let me explore the codebase systematically - starting with docs, then architecture, then memory implementation.
Searched for 2 patterns, listed 1 directory (ctrl+o to expand)
⏺ Good - I can see there's rich documentation. Let me read the architecture doc and memory-related docs in parallel.
Read 4 files (ctrl+o to expand)
⏺ Excellent. Now let me dive into the actual code implementation.
Searched for 1 pattern, read 2 files (ctrl+o to expand)
⏺ Now let me look at how memory is wired into the agent loop and the built-in memory tool implementation.
Searched for 2 patterns, read 1 file (ctrl+o to expand)
⏺ Now let me look at the built-in memory tool and the store/retrieval for Holographic.
Read 3 files (ctrl+o to expand)
⏺ Now let me check the HRR (Holographic Reduced Representations) module — the most interesting piece.
Read 1 file (ctrl+o to expand)
⏺ Now I have a thorough picture. Let me also quickly check how memory integrates into prompt assembly and look at one more provider for comparison.
Read 2 files (ctrl+o to expand)
⏺ Now I have a comprehensive understanding. Here's the full breakdown:
---
Hermes Agent Memory Architecture
Three-Layer Design
Hermes has a layered memory system — simple to powerful:
Layer 1: Built-in Memory (MEMORY.md + USER.md)
Location: tools/memory_tool.py → MemoryStore
The simplest, always-on layer. Two flat files with bounded char limits:
┌───────────┬─────────────────────────────────────────────────┬─────────────┐
│ File │ Purpose │ Limit │
├───────────┼─────────────────────────────────────────────────┼─────────────┤
│ MEMORY.md │ Agent's notes (env facts, conventions, lessons) │ 2,200 chars │
├───────────┼─────────────────────────────────────────────────┼─────────────┤
│ USER.md │ User profile (preferences, role, style) │ 1,375 chars │
└───────────┴─────────────────────────────────────────────────┴─────────────┘
Key design decisions worth stealing:
- Frozen snapshot pattern — memory is captured once at session start and injected into the system prompt. Mid-session writes go to disk immediately but DON'T update the prompt. This
preserves LLM prefix cache across all turns.
- § delimiter — entries separated by \n§\n, supports multiline entries
- Substring matching for replace/remove — no IDs, just pass a unique substring of the entry you want to modify
- Atomic writes — temp file + os.replace() for crash safety
- File locking — fcntl.flock() for cross-process concurrent access
- Security scanning — entries are checked for prompt injection patterns before being stored (since they get injected into system prompts)
- Capacity management — tool response includes usage: "67% — 1,474/2,200 chars" so the LLM knows when to consolidate
Layer 2: MemoryManager (Orchestration)
Location: agent/memory_manager.py
The glue layer. Enforces one builtin + one external provider max. Key responsibilities:
- Tool routing — maps tool names to providers via _tool_to_provider dict
- Context fencing — wraps prefetched memory in <memory-context> tags with a system note so the LLM doesn't treat recalled context as user input
- Fault isolation — failures in one provider never block the other (every call wrapped in try/except)
- Lifecycle hooks — delegates prefetch, sync_turn, on_session_end, on_pre_compress, on_memory_write, on_delegation to all registered providers
Integration into the agent loop (run_agent.py):
session start → initialize_all() + build_system_prompt()
before API call → prefetch_all(user_message) → injected as <memory-context>
after each turn → sync_all(user_msg, assistant_msg) + queue_prefetch_all()
memory tool use → on_memory_write() mirrors to external provider
pre-compress → on_pre_compress() extracts insights before discarding
session end → on_session_end() + shutdown_all()
Layer 3: External Providers (Plugin System)
Location: agent/memory_provider.py (ABC) + plugins/memory/*/
The MemoryProvider ABC defines 15 methods (5 required, 10 optional hooks). Eight providers ship:
┌─────────────┬──────────────┬────────────────────────────────────────────────────────────────┐
│ Provider │ Storage │ Interesting Approach │
├─────────────┼──────────────┼────────────────────────────────────────────────────────────────┤
│ Holographic │ Local SQLite │ HRR algebra — compositional vector queries │
├─────────────┼──────────────┼────────────────────────────────────────────────────────────────┤
│ Honcho │ Cloud │ Dialectic reasoning — LLM-synthesized context │
├─────────────┼──────────────┼────────────────────────────────────────────────────────────────┤
│ Hindsight │ Cloud/Local │ Knowledge graph + cross-memory reflection │
├─────────────┼──────────────┼────────────────────────────────────────────────────────────────┤
│ Mem0 │ Cloud │ Server-side LLM fact extraction │
├─────────────┼──────────────┼────────────────────────────────────────────────────────────────┤
│ OpenViking │ Self-hosted │ Tiered retrieval (L0→L1→L2 detail levels) │
├─────────────┼──────────────┼────────────────────────────────────────────────────────────────┤
│ Supermemory │ Cloud │ Context fencing — strips recalled memories from captured turns │
├─────────────┼──────────────┼────────────────────────────────────────────────────────────────┤
│ RetainDB │ Cloud │ Delta compression │
├─────────────┼──────────────┼────────────────────────────────────────────────────────────────┤
│ ByteRover │ Local/CLI │ Pre-compression extraction │
└─────────────┴──────────────┴────────────────────────────────────────────────────────────────┘
The Holographic Provider — Most Interesting for Your Use Case
This is the standout. It uses Holographic Reduced Representations (HRR) — a vector symbolic architecture from cognitive science (Plate 1995). The key insight: you can encode structured
relationships into fixed-width vectors and extract them algebraically.
How HRR works (plugins/memory/holographic/holographic.py):
1. Atoms — Each concept gets a deterministic phase vector via SHA-256 (reproducible across machines):
encode_atom("python", dim=1024) # → 1024 phases in [0, 2π)
2. Bind (association) — Phase addition (a + b) % 2π. Creates a composite that's dissimilar to both inputs:
bind(entity_vec, role_vec) # "python" bound to "entity role"
3. Unbind (retrieval) — Phase subtraction. Reverses binding:
unbind(bind(a, b), a) ≈ b # retrieve b given a
4. Bundle (superposition) — Circular mean of complex exponentials. Merges N vectors into one similar to all:
bundle(fact1, fact2, fact3) # one vector representing all three
5. Fact encoding — Each fact is: bundle(bind(content, ROLE_CONTENT), bind(entity1, ROLE_ENTITY), bind(entity2, ROLE_ENTITY), ...)
This enables operations no embedding DB can do:
- probe("python") — algebraically unbind "python" from the memory bank to extract everything structurally connected to it
- reason(["python", "backend"]) — AND-semantics: find facts where BOTH entities play structural roles (uses min() over entity similarity scores)
- contradict() — automated detection of conflicting facts: high entity overlap + low content similarity = contradiction
Retrieval pipeline (plugins/memory/holographic/retrieval.py):
query → FTS5 candidates (3x limit) → Jaccard rerank → HRR similarity → trust weighting → temporal decay → top-K
Three weighted signals: FTS5 (0.4) + Jaccard (0.3) + HRR (0.3). Falls back gracefully to FTS-only if numpy isn't installed.
Trust scoring — asymmetric feedback: helpful = +0.05, unhelpful = -0.10. Bad facts sink twice as fast as good facts rise.
What Makes This Deployable
For your application, the key patterns to adopt:
1. Bounded built-in + unbounded external — Always have a cheap, fast memory layer in the system prompt (~1,300 tokens). Expensive retrieval is additive.
2. Frozen snapshot + live mutations — Don't break your prefix cache by updating the system prompt mid-conversation. Writes persist to disk but the prompt snapshot is frozen.
3. on_pre_compress hook — Before context compression discards old messages, give the memory provider a chance to extract insights. ByteRover and Hindsight use this to prevent knowledge
loss.
4. Context fencing — Wrap recalled memories in tagged blocks with explicit system notes so the LLM doesn't confuse them with user input.
5. Single-provider constraint — Only one external provider active at a time prevents tool schema bloat and conflicting backends.
6. Profile isolation — All storage paths use hermes_home kwarg, not hardcoded paths. Each user/profile gets its own memory namespace.
Want me to dive deeper into any specific aspect — the HRR math, the wiring in run_agent.py, the provider plugin interface, or how to adapt any of this for your application?
✻ Churned for 1m 29s
❯ on a tangent what is the documentation system it's not mkdocs no?
⏺ It's Docusaurus — React-based, from Meta.
Read 1 file (ctrl+o to expand)
⏺ Docusaurus 3.9.2. The docs live in website/docs/ as markdown files with YAML frontmatter (sidebar_position, title, description). The _category_.json files control sidebar grouping.
Standard Docusaurus setup — npm run start for dev, npm run build for static output.
❯ it's free?
⏺ Yes, fully free and open source (MIT license). You just run npx create-docusaurus to scaffold a site, write markdown, and it generates a static site you can host anywhere (GitHub Pages,
Vercel, Netlify, etc.) for free too.
❯ I remember earlier versions honcho was defaulta nd it was local did it change now?
⏺ Let me check the git history for that.
⏺ Bash(git log --oneline --all --grep="honcho" --grep="default" --all-match | head -20)
⎿  d9f53dba feat(honcho): add opt-in initOnSessionStart for tools mode and respe...
c58e1675 docs: fix 40+ discrepancies between documentation and codebase (#5818)
583d9f95 fix(honcho): migration guard for observation mode default change
… +17 lines (ctrl+o to expand)
⏺ Bash(git log --oneline --all --grep="memory provider" | head -20)
⎿  e964cfc4 fix(gateway): trigger memory provider shutdown on /new and /reset
4f467700 fix(doctor): only check the active memory provider, not all provider...
84194db6 fix(doctor): only check the active memory provider, not all provider...
… +17 lines (ctrl+o to expand)
⏺ Bash(git log --oneline --all --grep="honcho" --diff-filter=A -- "tools/honcho*" | head -10 && echo "---" && git log --oneline 924bc67e..924bc67e~1 -1 && echo "---" &…)
⎿  ab4bbf2f feat: add Honcho AI-native memory integration
---
---
… +20 lines (ctrl+o to expand)
⏺ Bash(git log --oneline --all --grep="honcho" --reverse | head -10)
⎿  55107a19 refactor(memory): restructure plugins, add CLI, clean gateway, migra...
5d278aa3 feat(memory): standardize plugin config + add per-plugin documentation
7a34aebb docs: add memory providers user guide + developer guide
… +7 lines (ctrl+o to expand)
⏺ Bash(git log --oneline --reverse --all -- "*honcho*" | head -5 && echo "---" && git log --oneline ab4bbf2f -1 --format="%h %ci %s")
⎿  1d5a39e0 fix: thread safety for concurrent subagent delegation (#1672)
6405d389 test: align Hermes setup and full-suite expectations (#1710)
4ad00831 fix(honcho): read HONCHO_BASE_URL for local/self-hosted instances
… +4 lines (ctrl+o to expand)
⏺ Bash(git show ab4bbf2f --stat | head -20 && echo "---" && git log --oneline --all --grep="honcho" --grep="local\|self.host\|default" --all-match --reverse | head -10)
⎿  commit ab4bbf2fb2f3feea9e6fb772248ad09029ea04e1
Author: Erosika <eri@plasticlabs.ai>
Date: Wed Feb 25 19:34:25 2026 -0500
… +28 lines (ctrl+o to expand)
⏺ Here's what happened:
Honcho was never the default, and was never local. It was always opt-in cloud (or self-hosted) — the original commit (Feb 2026) says "Zero impact when disabled or unconfigured" and
required ~/.honcho/config.json with enabled=true.
What you might be remembering is the built-in memory (MEMORY.md/USER.md) which IS local and IS always-on by default. Before the plugin system was introduced (April 2026, PR #4623),
Honcho was the only external memory option and got a lot of attention in the docs/releases — so it may have felt like the default.
The current state:
- Default: Built-in only (MEMORY.md + USER.md, local files)
- External: Zero providers active unless you explicitly run hermes memory setup
- Honcho: Cloud API or self-hosted, always requires config + API key
The Holographic provider is the closest to "local default" — it needs zero dependencies (SQLite is built in, numpy is optional) and has no external API. But even that requires explicit
activation.
※ recap: You're studying Hermes Agent's memory system to deploy a similar approach in your application. We covered the full architecture from docs through code. No coding task is in
progress — pick which aspect you want to adapt next. (disable recaps in /config)
❯ and hpolographis is a library or hermes created option
⏺ It's Hermes-created. The whole thing lives inline in the plugin directory — no external library.
⏺ Bash(git log --oneline --reverse -- "plugins/memory/holographic/*" | head -10)
⎿  924bc67e feat(memory): pluggable memory provider interface with profile isola...
8a384628 fix(memory): profile-scoped memory isolation and clone support (#4845)
bef895b3 fix(memory): preserve holographic prompt and trust score rendering
… +2 lines (ctrl+o to expand)
⏺ Bash(git log 924bc67e -1 --format="%H %ci %an" && echo "---" && git log --all --oneline --grep="dusterbloom\|holographic" --reverse | head -10)
⎿  924bc67eee35cc2fbb24d7cbc5649c820beb4406 2026-04-02 15:33:51 -0700 Teknium
---
b374f520 fix(session): clear compressor summary and turn counter on /clear an...
… +9 lines (ctrl+o to expand)
⏺ Community-contributed (by dusterbloom in PR #2351 per the docstring), then adapted to the MemoryProvider interface by the core team. But the HRR implementation itself — holographic.py,
store.py, retrieval.py — is all custom code written for Hermes. It's not wrapping any library; it implements Plate's 1995 HRR paper directly in ~200 lines of numpy.
The only external dependency is numpy (optional) — without it, the HRR features degrade gracefully to FTS5-only search.
✻ Baked for 34s
❯ i need to understand HRR more in detail. Can you explain pednatically with using ascii charts and pros and cons.
⏺ Holographic Reduced Representations (HRR) — From First Principles
The Problem HRR Solves
You want to store structured knowledge (not just text blobs) in fixed-width vectors. Traditional embeddings lose structure:
Embedding("Alice manages Bob") ≈ Embedding("Bob manages Alice")
cosine similarity ~0.95
THEY LOOK THE SAME!
Embeddings capture what words appear but not who does what to whom. HRR preserves this structure.
---
Core Idea: Phase Vectors
Instead of real-valued vectors like embeddings [0.23, -0.41, 0.87, ...], HRR uses phase angles in [0, 2π):
Traditional embedding: [-0.41, 0.23, 0.87, -0.12, 0.55]
real numbers, unbounded
HRR phase vector: [ 1.24, 4.71, 0.33, 5.88, 2.09]
angles on a circle, always [0, 2π)
Think of each dimension as a clock hand pointing somewhere:
dim 0 dim 1 dim 2 dim 3
12 o'clock
|
● ●-- --● |
/ \ \ ●
/ ↓ →
(phase=1.24) (phase=4.71) (phase=0.33) (phase=5.88)
Each concept gets a deterministic phase vector from SHA-256 hashing. Same word → same vector, anywhere, anytime:
# From holographic.py line 43-67
def encode_atom(word, dim=1024):
digest = sha256(f"{word}:0".encode()).digest()
uint16_values = struct.unpack("<16H", digest) # 16 values per block
phases = values * (2π / 65536) # scale to [0, 2π)
return phases[:dim]
encode_atom("alice", dim=8) → [2.41, 0.87, 5.12, 1.03, 3.77, 4.55, 0.29, 2.88]
encode_atom("bob", dim=8) → [4.92, 3.14, 1.67, 0.44, 5.01, 2.33, 3.89, 1.56]
encode_atom("alice", dim=8) → [2.41, 0.87, 5.12, 1.03, 3.77, 4.55, 0.29, 2.88]
↑ IDENTICAL every time — deterministic from SHA-256
---
The Three Operations
1. BIND — "Associate two things" (phase addition mod 2π)
# From holographic.py line 70-77
def bind(a, b):
return (a + b) % (2 * π)
Binding creates a pair/relationship. The result is dissimilar to both inputs:
alice = [2.41, 0.87, 5.12, 1.03]
role = [1.50, 3.00, 0.80, 4.20]
+
─────────────────────────────────
result = [3.91, 3.87, 5.92, 5.23] (mod 2π)
similarity(result, alice) ≈ 0.0 ← dissimilar!
similarity(result, role) ≈ 0.0 ← dissimilar!
Why dissimilar? Adding random angles to random angles gives... random angles. This is crucial — the bound pair doesn't "look like" either component. It's a new thing.
Analogy: Multiplying two prime numbers. 7 × 13 = 91. 91 doesn't resemble 7 or 13, but if you know one factor you can recover the other.
2. UNBIND — "Retrieve from a pair" (phase subtraction mod 2π)
# From holographic.py line 80-87
def unbind(memory, key):
return (memory - key) % (2 * π)
The inverse of bind. Given a bound pair and one component, recover the other:
bound = bind(alice, role) = [3.91, 3.87, 5.92, 5.23]
- role = [1.50, 3.00, 0.80, 4.20]
─────────────────────────
recovered = [2.41, 0.87, 5.12, 1.03]
similarity(recovered, alice) ≈ 1.0 ← GOT ALICE BACK!
This is exact when there's only one binding. With superposition (bundle), it's approximate — that's where noise enters.
3. BUNDLE — "Merge multiple vectors into one" (circular mean)
# From holographic.py line 90-98
def bundle(*vectors):
complex_sum = np.sum([np.exp(1j * v) for v in vectors], axis=0)
return np.angle(complex_sum) % (2 * π)
This is the trickiest one. You can't just average angles (the average of 1° and 359° is NOT 180°). Instead, convert each phase to a point on the unit circle, sum the complex numbers,
take the angle of the sum:
phase = 1.0 rad phase = 5.5 rad
● |
/| |
/ | ─────+────●
/ | | /
+───+ | /
| /
as complex: as complex:
cos(1.0) + i·sin(1.0) cos(5.5) + i·sin(5.5)
= 0.54 + 0.84i = 0.71 - 0.71i
SUM = 1.25 + 0.13i
angle(SUM) = 0.10 rad ← the circular mean
The bundle is similar to all inputs (opposite of bind):
fact1 = bind(encode_text("alice manages backend"), ROLE_CONTENT)
fact2 = bind(encode_text("bob manages frontend"), ROLE_CONTENT)
fact3 = bind(encode_text("alice likes python"), ROLE_CONTENT)
memory_bank = bundle(fact1, fact2, fact3)
similarity(memory_bank, fact1) > 0 ← positive! it's "in there"
similarity(memory_bank, fact2) > 0 ← positive!
similarity(memory_bank, fact3) > 0 ← positive!
similarity(memory_bank, noise) ≈ 0 ← random stuff? ~zero
---
How Hermes Encodes a Fact
Each fact is a structured bundle of role-bound components:
# From holographic.py line 135-160
def encode_fact(content, entities, dim=1024):
role_content = encode_atom("__hrr_role_content__", dim)
role_entity = encode_atom("__hrr_role_entity__", dim)
components = [bind(encode_text(content), role_content)]
for entity in entities:
components.append(bind(encode_atom(entity), role_entity))
return bundle(*components)
Visually:
Fact: "Alice manages the backend team"
Entities extracted: ["Alice", "Backend Team"]
┌──────────────────────────────────────────────────────┐
│ FACT VECTOR │
│ │
│ bundle( │
│ bind(text("alice manages backend team"), ROLE_C), │ ← content slot
│ bind(atom("alice"), ROLE_E), │ ← entity slot 1
│ bind(atom("backend team"), ROLE_E) │ ← entity slot 2
│ ) │
│ │
│ = single vector of 1024 phases │
└──────────────────────────────────────────────────────┘
Now you can algebraically query this:
Q: "What do we know about Alice?"
probe_key = bind(atom("alice"), ROLE_ENTITY)
residual = unbind(fact_vector, probe_key)
residual ≈ content_vector ← "manages the backend team" signal emerges!
similarity(residual, bind(text("manages backend"), ROLE_CONTENT)) >> 0
---
Multi-Fact Memory Bank
Multiple facts get bundled into a single bank vector:
MEMORY BANK (category: "project")
┌────────────────────────────────────────────────────┐
│ │
│ bundle( │
│ encode_fact("Alice manages backend", [Alice]), │
│ encode_fact("Bob leads frontend", [Bob]), │
│ encode_fact("Alice likes Python", [Alice]), │
│ encode_fact("Deploy is on Fridays", []), │
│ ) │
│ │
│ = ONE vector, 1024 phases, 8 KB on disk │
│ │
└────────────────────────────────────────────────────┘
Probe: "What about Alice?"
unbind(bank, bind(atom("alice"), ROLE_ENTITY))
residual vector
├── high similarity to fact 1 content ("manages backend") ✓
├── high similarity to fact 3 content ("likes python") ✓
├── low similarity to fact 2 content ("leads frontend") ✗ (Bob, not Alice)
└── low similarity to fact 4 content ("deploy fridays") ✗ (no entity match)
Reason: "Facts about BOTH Alice AND backend?"
probe_alice = bind(atom("alice"), ROLE_ENTITY)
probe_backend = bind(atom("backend"), ROLE_ENTITY)
For each fact:
score_alice = similarity(unbind(fact, probe_alice), ROLE_CONTENT)
score_backend = similarity(unbind(fact, probe_backend), ROLE_CONTENT)
final_score = MIN(score_alice, score_backend) ← AND semantics!
fact must score high on BOTH to rank
Result: "Alice manages backend" ranks #1
"Alice likes Python" ranks low (no "backend" entity)
"Bob leads frontend" ranks low (no "Alice" entity)
---
Signal-to-Noise Ratio
The fundamental limit. As you bundle more facts, the signal degrades:
SNR = √(dim / n_items)
dim=1024, n_items:
n_items │ SNR │ Quality
────────┼────────┼──────────────────
1 │ 32.0 │ Perfect recall
10 │ 10.1 │ Excellent
50 │ 4.5 │ Good
100 │ 3.2 │ Usable
256 │ 2.0 │ ⚠️ Errors start (Hermes warns here)
500 │ 1.4 │ Unreliable
1024 │ 1.0 │ Noise = signal, useless
# From holographic.py line 179-203
def snr_estimate(dim, n_items):
snr = math.sqrt(dim / n_items)
if snr < 2.0:
logger.warning("HRR storage near capacity...")
return snr
This is why Hermes uses HRR as one signal among three in the retrieval pipeline, not the only signal:
┌─────────────────────────────────────────────────┐
│ RETRIEVAL PIPELINE │
│ │
│ query │
│ │ │
│ ▼ │
│ FTS5 candidates (3x limit) ← SQLite search │
│ │ │
│ ▼ │
│ Score each candidate: │
│ 0.4 × FTS5_rank ← keyword match │
│ + 0.3 × Jaccard_similarity ← token overlap │
│ + 0.3 × HRR_similarity ← structural match │
│ │ │
│ ▼ │
│ × trust_score ← feedback weight │
│ × temporal_decay ← freshness │
│ │ │
│ ▼ │
│ top-K results │
└─────────────────────────────────────────────────┘
---
Pros and Cons
┌─────────────────────────────────────────────────────────────────────┐
│ PROS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ COMPOSITIONAL QUERIES │
│ "Facts about Alice AND backend" — no embedding DB can do this. │
│ Embeddings find similar text. HRR finds structural relationships. │
│ │
│ ZERO EXTERNAL DEPENDENCIES │
│ No vector DB, no API, no GPU. SQLite + numpy (optional). │
│ The entire thing is ~600 lines of Python. │
│ │
│ DETERMINISTIC REPRESENTATIONS │
│ SHA-256 → same word = same vector everywhere. No model to load, │
│ no training, no drift. Reproducible across machines. │
│ │
│ CONSTANT MEMORY PER BANK │
│ 1000 facts bundled into one 8 KB vector. Storage doesn't grow │
│ with fact count (individual facts still stored in SQLite though). │
│ │
│ ALGEBRAIC RETRIEVAL │
│ unbind() is O(dim) — microseconds. No ANN index, no search. │
│ Extraction is a single numpy subtraction. │
│ │
│ CONTRADICTION DETECTION │
│ High entity overlap + low content similarity = automatic conflict │
│ detection. "Alice manages backend" vs "Alice manages frontend" │
│ flags automatically. No other memory system does this. │
│ │
│ GRACEFUL DEGRADATION │
│ numpy missing? Falls back to FTS5-only. No crashes. │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ CONS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ CAPACITY WALL │
│ SNR = √(dim/n). At dim=1024, ~256 facts per bank before │
│ retrieval degrades. You CAN increase dim (2048, 4096) but │
│ storage and compute scale linearly. │
│ │
│ dim │ safe facts │ vector size │
│ ───────┼────────────┼──────────── │
│ 1024 │ ~256 │ 8 KB │
│ 4096 │ ~1024 │ 32 KB │
│ 16384 │ ~4096 │ 128 KB │
│ │
│ NO SEMANTIC UNDERSTANDING │
│ "python" and "Python" are different atoms. "car" and "automobile" │
│ are unrelated. There's no learned semantics — it's purely │
│ symbolic. The bag-of-words encode_text() doesn't understand │
│ synonyms, paraphrases, or meaning. │
│ │
│ Embedding: sim("car", "automobile") ≈ 0.85 │
│ HRR: sim("car", "automobile") ≈ 0.00 │
│ │
│ ENTITY EXTRACTION IS NAIVE │
│ Regex-based: capitalized phrases, quoted terms, "aka" patterns. │
│ Misses: "the backend" (no caps), "JS" (too short), │
│ "machine learning" (common words). No NER model. │
│ │
│ NOISY RETRIEVAL │
│ Unlike exact key-value lookup, unbind returns an approximate │
│ signal with superposition noise. You always need a scoring │
│ threshold and top-K cutoff. False positives increase with │
│ bank size. │
│ │
│ WORD ORDER LOST IN ENCODE_TEXT │
│ encode_text() is bag-of-words: bundle of atom vectors. │
│ "alice manages bob" = "bob manages alice" at the text level. │
│ Structure is ONLY preserved through explicit entity+role binding, │
│ not through the text encoding itself. │
│ │
│ BANK REBUILD COST │
│ Every add/update/remove triggers a full category bank rebuild — │
│ re-bundle ALL fact vectors in that category. O(n × dim). │
│ At 1000 facts: ~1M float ops. Fast on modern CPUs but not free. │
│ │
│ NO LEARNED SIMILARITY │
│ Two facts about the same topic but with zero word overlap will │
│ have zero HRR similarity. Embeddings would catch the semantic │
│ connection. HRR is algebraic, not statistical. │
│ │
└─────────────────────────────────────────────────────────────────────┘
When to Use HRR vs Embeddings vs Full-Text Search
FTS5 HRR Embeddings
(keyword) (algebraic) (semantic)
───────────────────────────────────────────────────────────
"find python" ✓✓✓ ✓✓ ✓✓✓
"find scripting ✗ ✗ ✓✓✓
language"
"facts about ✗ ✓✓✓ ✗
Alice AND Bob"
"contradictions?" ✗ ✓✓✓ ✗
synonyms ✗ ✗ ✓✓✓
latency ~1ms ~0.1ms ~10-100ms
dependencies SQLite numpy model + index
max items unlimited √dim unlimited
That's why Hermes uses all three in the retrieval pipeline — each covers the others' blind spots. HRR shines specifically for structural/compositional queries that neither keyword
search nor embeddings can handle.
✻ Worked for 1m 41s
❯ are there any ways new to combat
SNR = √(dim / n_items) isue so we can actuially store more facts without this terrible degredation
⏺ Great question. The √(dim/n) limit is fundamental to superposition in dense vectors — it's a consequence of the central limit theorem. But there are real ways around it, ranging from
simple engineering to cutting-edge research.
Why the limit exists
Each bundled item contributes signal O(1) and noise O(1/√dim) to every other item. With n items, total noise per query is √n / √dim. So:
SNR = signal / noise = 1 / (√n / √dim) = √(dim / n)
You can't fix this by "trying harder" within the same algebra. You need to change the algebra, the representation, or the architecture.
---
1. Cleanup Memories (easiest win, biggest impact)
The noisy vector you get from unbind() is close to the right answer but corrupted. If you have a codebook of all known atoms, do nearest-neighbor lookup after extraction:
WITHOUT cleanup:
unbind(bank, key) → noisy_vector → similarity threshold → hope for the best
analog, noise accumulates
WITH cleanup:
unbind(bank, key) → noisy_vector → argmax similarity(noisy, codebook) → exact atom
digital decision, noise eliminated
Capacity without cleanup: O(√dim) = ~32 at dim=1024
Capacity with cleanup: O(dim/ln(dim)) ≈ ~147 at dim=1024
~585 at dim=4096
The math: cleanup succeeds when the correct atom's signal exceeds the maximum noise among all K codebook entries. The max of K Gaussian noise samples grows as √(2 ln K), so you need:
1/√n > √(2 ln K) / √dim
n < dim / (2 ln K)
For K=1000 atoms, dim=1024: n < 1024/14 ≈ 73 per bank — still limited but 4-5x better than raw threshold.
Implementation sketch:
class CleanupMemory:
def __init__(self, dim=1024):
self.codebook = {} # name → phase_vector
def register(self, name, vector):
self.codebook[name] = vector
def clean(self, noisy_vector):
best_name, best_sim = None, -1
for name, vec in self.codebook.items():
sim = np.mean(np.cos(noisy_vector - vec))
if sim > best_sim:
best_name, best_sim = name, sim
return best_name, best_sim
Hermes doesn't do this today — it would be the single easiest upgrade.
---
2. Resonator Networks (the big breakthrough, 2020)
Frady, Kleyko & Sommer's key insight: instead of one-shot unbinding, use iterative factorization. A resonator network treats retrieval as a constraint satisfaction problem.
ONE-SHOT UNBIND (what Hermes does):
bank = bundle(bind(a,x), bind(b,y), bind(c,z))
unbind(bank, a) → noisy_x
capacity: O(√dim)
RESONATOR NETWORK:
Initialize: estimate_x = random, estimate_y = random
Iterate:
estimate_x = cleanup(unbind(bank, estimate_y)) ← use x's codebook
estimate_y = cleanup(unbind(bank, estimate_x)) ← use y's codebook
repeat until convergence
capacity: O(dim) in favorable cases
One-shot: O(√dim) = 32 at dim=1024
Resonator: O(dim^0.69) = ~200 at dim=1024 (empirical)
Theoretical: O(dim) = 1024 at dim=1024 (sparse cases)
Why it works — the iterative process uses the structure of the codebook as a constraint. Each iteration reduces noise because cleanup projects onto the discrete codebook, and the other
factor's estimate improves reciprocally:
Iteration 0: estimate_x = garbage, estimate_y = garbage
Iteration 1: estimate_x = noisy_x, estimate_y = noisy_y
Iteration 2: estimate_x = cleaner, estimate_y = cleaner
...
Iteration 5: estimate_x ≈ x, estimate_y ≈ y ← converged
┌─────────────────────────────────────────────┐
│ Error vs iteration (typical) │
│ │
│ 1.0 ─ ● │
│ │\ │
│ 0.5 ─ │ ● │
│ │ \ │
│ 0.1 ─ │ ●──●──●──● │
│ │ │
│ 0.0 ─ ┼───┼───┼───┼───┼─── │
│ 0 1 2 3 4 5 iterations │
└─────────────────────────────────────────────┘
The downside: you need to know the codebook (all possible atoms) upfront. In Hermes's case, the entity table IS the codebook — so this is feasible.
---
3. Sparse Block Codes (different representation entirely)
Replace dense phase vectors with sparse structured vectors. Divide dim into B blocks of size S. Each block has exactly one active position:
Dense HRR (dim=16):
[2.41, 0.87, 5.12, 1.03, 3.77, 4.55, 0.29, 2.88, 1.44, 5.01, ...]
every element contributes noise
Sparse Block Code (dim=16, B=4 blocks of S=4):
[0,0,1,0 | 1,0,0,0 | 0,0,0,1 | 0,1,0,0]
↑ ↑ ↑ ↑
one-hot one-hot one-hot one-hot per block
Binding = permute within blocks. Bundling = element-wise OR + threshold.
Capacity: O(B × S / ln(B × S))
For same dim=1024:
Dense HRR: O(√1024) = 32 facts
Sparse (B=64, S=16): O(1024/ln(1024)) = ~146 facts
Sparse (B=256, S=64): O(16384/ln(...)) = ~1500 facts at dim=16384
The fundamental advantage: noise in sparse representations is combinatorial (how many blocks align by chance) not Gaussian (cumulative analog noise). Combinatorial coincidence is
exponentially unlikely.
---
4. Hierarchical Partitioning (engineering, not math)
Hermes already does this partially with category banks. Push it further:
FLAT (current Hermes):
bank_project = bundle(all project facts) ← 200 facts, SNR = 2.3
HIERARCHICAL:
bank_project_backend = bundle(30 facts) ← SNR = 5.8
bank_project_frontend = bundle(25 facts) ← SNR = 6.4
bank_project_infra = bundle(20 facts) ← SNR = 7.2
bank_project_people = bundle(15 facts) ← SNR = 8.3
meta_bank = bundle( ← only 4 items, SNR = 16
bind(atom("backend"), bank_project_backend),
bind(atom("frontend"), bank_project_frontend),
bind(atom("infra"), bank_project_infra),
bind(atom("people"), bank_project_people),
)
Query: unbind(meta_bank, atom("backend")) → bank_project_backend
unbind(that, probe_key) → fact-level results
Two-level unbinding with 30 facts per leaf bank vs one-level with 200 facts. No math breakthrough needed — just better data organization.
---
5. MAP (Multiply-Add-Permute) — Binary Alternative
Replace continuous phases with bipolar binary {-1, +1}:
# MAP operations
def bind(a, b): return a * b # element-wise multiply
def unbind(m, k): return m * k # same as bind (self-inverse!)
def bundle(*vecs): return np.sign(sum(vecs)) # majority vote
Same O(√dim) theoretical limit BUT:
- 10-100x faster — integer ops, SIMD-friendly, bit-packable
- Can afford dim=65536 at negligible cost → √65536 = 256 facts per bank
- Majority vote bundling is more robust than circular mean in practice
dim=1024 phases: 8 KB per vector, float64 ops
dim=65536 binary: 8 KB per vector, bitwise XOR ops ← same memory, 64x capacity
---
Summary: What's Practical Today
┌─────────────────────────┬────────────┬─────────────┬───────────────┐
│ Approach │ Capacity │ Complexity │ Drop-in? │
├─────────────────────────┼────────────┼─────────────┼───────────────┤
│ Current Hermes │ O(√dim) │ trivial │ (baseline) │
│ │ ~32/bank │ │ │
├─────────────────────────┼────────────┼─────────────┼───────────────┤
│ + Cleanup memory │ O(dim/lnK) │ easy │ YES — add │
│ │ ~73/bank │ ~50 lines │ post-unbind │
├─────────────────────────┼────────────┼─────────────┼───────────────┤
│ + Hierarchical banks │ O(√dim) │ easy │ YES — change │
│ (more partitions) │ per leaf │ config only │ bank strategy │
├─────────────────────────┼────────────┼─────────────┼───────────────┤
│ + Resonator network │ O(dim^0.7) │ moderate │ ~200 lines, │
│ │ ~200/bank │ needs iter │ needs codebook│
├─────────────────────────┼────────────┼─────────────┼───────────────┤
│ Switch to binary MAP │ O(√dim) │ moderate │ NO — rewrites │
│ at dim=65536 │ ~256/bank │ schema chg │ store+retrieve│
├─────────────────────────┼────────────┼─────────────┼───────────────┤
│ Sparse Block Codes │ O(dim/lnD) │ hard │ NO — full │
│ │ ~1500 │ new algebra │ rewrite │
└─────────────────────────┴────────────┴─────────────┴───────────────┘
The highest ROI for Hermes would be cleanup memories + finer-grained bank partitioning. Both are additive changes, ~100 lines, and together would push usable capacity from ~32 to ~200+
facts per bank without touching the existing schema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment