Objective: move from large flat workspace memory to a fast, maintainable, automation-friendly knowledge system
We have a Kimi Claw workspace containing identity files, memory files, skill docs, logs, operational notes, and cloned repos. As the workspace grows, the current flat model becomes slower, noisier, and harder to maintain.
We want to evolve Claw in structured phases so it becomes:
- smaller in default context
- better at finding the right knowledge on demand
- better at ranking what matters
- better at handling stale/archive content
- easier to maintain over time
- ready for hooks, subagents, and future repo-local knowledge features
If useful, use Kimi Code inside Claw for research, file analysis, safe refactors, script generation, overlap detection, migration help, and implementation support.
-
Tiny always-load context Keep only critical identity, user preferences, tools, and current active context in default load.
-
On-demand retrieval Most knowledge should be discoverable via metadata, search, and semantic retrieval rather than preloaded into prompt context.
-
Hybrid retrieval Prefer a layered system:
- metadata / routing hints
- exact lookup
- full-text search
- semantic/vector recall
- reranking
- compact context injection
-
Clear lifecycle Knowledge should move through states like active, warm, dormant, stale, archived.
-
Automation with guardrails Use hooks and specialist subagents only where they improve maintenance safely and transparently.
-
Maintainability over cleverness Prefer local, inspectable, deterministic tools and data structures.
Reduce default load size, remove duplication, split large catch-all files, and separate active from archived knowledge.
- audit current core files:
BOOTSTRAP.mdIDENTITY.mdSOUL.mdUSER.mdMEMORY.mdTOOLS.mdAGENTS.md
- decide which files are:
- always-load
- shrink
- merge
- split
- archive
- merge or reduce overlapping identity/personality files
- improve
USER.mdwith stable user preferences - improve
TOOLS.mdwith operational environment notes - split
MEMORY.mdinto structured files such as:memory/MEMORY_INDEX.mdmemory/user/memory/projects/memory/infra/memory/workflows/memory/archive/
- create a small
ACTIVE.md - archive oversized or stale knowledge, especially giant low-use skill/data packs
- leave behind stubs/notes where archived items moved
- preserve backups or safe diffs
workspace-audit-phase1.md- restructured memory folders
- reduced always-load layer
ACTIVE.mdphase1-changes-summary.md
Make Claw fast at finding the right knowledge without loading too much by default.
- define metadata/frontmatter standard for key files
- build a document registry, preferably in SQLite
- index:
- path
- title
- kind
- tags
- summary
- scope
- archive state
- timestamps
- implement:
- exact search
- full-text search
- vector/chunk search
- hybrid retrieval orchestration
- document chunking rules
- embed useful content types only
- build usage logging
- expose simple commands/scripts for:
- rebuild index
- update index
- exact search
- hybrid search
- inspect document
- list indexed docs
- keep archive-aware ranking and filtering
- metadata standard
- index schema
- indexing scripts
- hybrid retrieval scripts
- retrieval docs
- usage logging surface
Make Claw smarter about prioritization, freshness, aliases, relationships, maintenance, and explainability.
- add quality-aware ranking using signals such as:
- exactness
- scope
- freshness
- use count
- recency of successful use
- archive/stale penalties
- active-context boost
- duplicate penalty
- add deduplication and result compression
- add entity and relationship extraction for high-value domains:
- projects
- servers
- tools
- workflows
- environments
- skills
- add aliases and canonical naming
- add active-context boosting using:
ACTIVE.md- recent usage
- current task/session context
- add lifecycle states:
- active
- warm
- dormant
- stale
- archived
- add memory maintenance commands:
- stale detection
- duplicate detection
- archive recommendations
- usage/popularity refresh
- add explainability/debug output for retrieval/ranking
- ranking model
- lifecycle model
- entity/relationship model
- aliases support
- maintenance scripts
- explainability docs
- no repo map generation in this phase
Enable safe, observable automation so Claw maintains itself over time.
- define event model:
- session_started
- session_finished
- file_updated
- skill_updated
- memory_updated
- archive_changed
- retrieval_failed
- maintenance_requested
- etc.
- build hook framework
- implement practical hooks:
- session-end compactor
- memory-update hook
- skill-update hook
- archive hook
- retrieval-failure hook
- daily/weekly maintenance hook
- define subagent framework
- implement initial subagents:
- memory-maintainer
- retrieval-debugger
- archive-manager
- skill-router
- add safety model and approval boundaries
- add approval queue for risky changes
- add drift detection:
- files changed but not reindexed
- missing metadata
- broken archive links
- stale lifecycle states
- repeated retrieval misses
- add reporting/logging for automation runs
- event model
- hook registry
- initial hooks
- initial subagents
- approval queue
- automation reports/logs
- drift detection tools
Only after the earlier phases are stable, consider more advanced upgrades.
- external vector DB if scale demands it:
- pgvector
- Qdrant
- Weaviate
- Chroma
- better rerankers
- graph-oriented retrieval for richer relationships
- repo-local repo maps maintained inside each repo
- cross-workspace federation
- more advanced feedback learning from retrieval outcomes
- richer UI or dashboard surfaces if helpful
Do not do Phase 5 early unless local/simple approaches clearly stop being enough.
- keep the system inspectable
- prefer small scripts over giant frameworks
- preserve human-readable docs
- avoid hidden magic
- back up or diff major changes
- favor recommendation-first behavior when automation risk is nontrivial
- do not generate repo maps in shared workspace unless explicitly requested later
- design so repo-local repo maps can be plugged in later cleanly
- Phase 1 complete first
- Phase 2 indexing and hybrid retrieval
- Phase 3 ranking, lifecycle, aliases, explainability
- Phase 4 hooks, subagents, approval flows
- Phase 5 only if needed
By the end of these phases, Claw should behave like:
- a small default-context agent
- with strong on-demand knowledge discovery
- with good prioritization and freshness awareness
- with explicit archive handling
- with safer automation for maintenance
- with clear paths for future scaling
It should feel less like “a giant pile of markdown and repos” and more like “a structured, self-maintaining workspace intelligence layer”.
For whichever phases are being implemented or planned now, provide:
- architecture summary
- folder/schema changes
- scripts/commands added
- docs added
- what was intentionally deferred
- risks or ambiguities
- recommended next step
Keep the implementation practical, local-first, and maintainable.