Task: Build Kimi Claw into a scalable, retrieval-first workspace system

Objective: move from large flat workspace memory to a fast, maintainable, automation-friendly knowledge system

We have a Kimi Claw workspace containing identity files, memory files, skill docs, logs, operational notes, and cloned repos. As the workspace grows, the current flat model becomes slower, noisier, and harder to maintain.

We want to evolve Claw in structured phases so it becomes:

smaller in default context
better at finding the right knowledge on demand
better at ranking what matters
better at handling stale/archive content
easier to maintain over time
ready for hooks, subagents, and future repo-local knowledge features

If useful, use Kimi Code inside Claw for research, file analysis, safe refactors, script generation, overlap detection, migration help, and implementation support.

Core design principles

Tiny always-load context Keep only critical identity, user preferences, tools, and current active context in default load.
On-demand retrieval Most knowledge should be discoverable via metadata, search, and semantic retrieval rather than preloaded into prompt context.
Hybrid retrieval Prefer a layered system:
- metadata / routing hints
- exact lookup
- full-text search
- semantic/vector recall
- reranking
- compact context injection
Clear lifecycle Knowledge should move through states like active, warm, dormant, stale, archived.
Automation with guardrails Use hooks and specialist subagents only where they improve maintenance safely and transparently.
Maintainability over cleverness Prefer local, inspectable, deterministic tools and data structures.

Phase 1 — Workspace simplification and memory restructuring

Goal

Reduce default load size, remove duplication, split large catch-all files, and separate active from archived knowledge.

Tasks

audit current core files:
- BOOTSTRAP.md
- IDENTITY.md
- SOUL.md
- USER.md
- MEMORY.md
- TOOLS.md
- AGENTS.md
decide which files are:
- always-load
- shrink
- merge
- split
- archive
merge or reduce overlapping identity/personality files
improve USER.md with stable user preferences
improve TOOLS.md with operational environment notes
split MEMORY.md into structured files such as:
- memory/MEMORY_INDEX.md
- memory/user/
- memory/projects/
- memory/infra/
- memory/workflows/
- memory/archive/
create a small ACTIVE.md
archive oversized or stale knowledge, especially giant low-use skill/data packs
leave behind stubs/notes where archived items moved
preserve backups or safe diffs

Deliverables

workspace-audit-phase1.md
restructured memory folders
reduced always-load layer
ACTIVE.md
phase1-changes-summary.md

Phase 2 — Retrieval, indexing, and routing

Goal

Make Claw fast at finding the right knowledge without loading too much by default.

Tasks

define metadata/frontmatter standard for key files
build a document registry, preferably in SQLite
index:
- path
- title
- kind
- tags
- summary
- scope
- archive state
- timestamps
implement:
- exact search
- full-text search
- vector/chunk search
- hybrid retrieval orchestration
document chunking rules
embed useful content types only
build usage logging
expose simple commands/scripts for:
- rebuild index
- update index
- exact search
- hybrid search
- inspect document
- list indexed docs
keep archive-aware ranking and filtering

Deliverables

metadata standard
index schema
indexing scripts
hybrid retrieval scripts
retrieval docs
usage logging surface

Phase 3 — Intelligence layer on top of retrieval

Exclude for now: repo map generation

Goal

Make Claw smarter about prioritization, freshness, aliases, relationships, maintenance, and explainability.

Tasks

add quality-aware ranking using signals such as:
- exactness
- scope
- freshness
- use count
- recency of successful use
- archive/stale penalties
- active-context boost
- duplicate penalty
add deduplication and result compression
add entity and relationship extraction for high-value domains:
- projects
- servers
- tools
- workflows
- environments
- skills
add aliases and canonical naming
add active-context boosting using:
- ACTIVE.md
- recent usage
- current task/session context
add lifecycle states:
- active
- warm
- dormant
- stale
- archived
add memory maintenance commands:
- stale detection
- duplicate detection
- archive recommendations
- usage/popularity refresh
add explainability/debug output for retrieval/ranking

Deliverables

ranking model
lifecycle model
entity/relationship model
aliases support
maintenance scripts
explainability docs
no repo map generation in this phase

Phase 4 — Automation layer: hooks, subagents, upkeep

Goal

Enable safe, observable automation so Claw maintains itself over time.

Tasks

define event model:
- session_started
- session_finished
- file_updated
- skill_updated
- memory_updated
- archive_changed
- retrieval_failed
- maintenance_requested
- etc.
build hook framework
implement practical hooks:
- session-end compactor
- memory-update hook
- skill-update hook
- archive hook
- retrieval-failure hook
- daily/weekly maintenance hook
define subagent framework
implement initial subagents:
- memory-maintainer
- retrieval-debugger
- archive-manager
- skill-router
add safety model and approval boundaries
add approval queue for risky changes
add drift detection:
- files changed but not reindexed
- missing metadata
- broken archive links
- stale lifecycle states
- repeated retrieval misses
add reporting/logging for automation runs

Deliverables

event model
hook registry
initial hooks
initial subagents
approval queue
automation reports/logs
drift detection tools

Phase 5 — Optional advanced upgrades later

Goal

Only after the earlier phases are stable, consider more advanced upgrades.

Possible areas

external vector DB if scale demands it:
- pgvector
- Qdrant
- Weaviate
- Chroma
better rerankers
graph-oriented retrieval for richer relationships
repo-local repo maps maintained inside each repo
cross-workspace federation
more advanced feedback learning from retrieval outcomes
richer UI or dashboard surfaces if helpful

Important note

Do not do Phase 5 early unless local/simple approaches clearly stop being enough.

Cross-phase constraints

keep the system inspectable
prefer small scripts over giant frameworks
preserve human-readable docs
avoid hidden magic
back up or diff major changes
favor recommendation-first behavior when automation risk is nontrivial
do not generate repo maps in shared workspace unless explicitly requested later
design so repo-local repo maps can be plugged in later cleanly

Suggested implementation order

Phase 1 complete first
Phase 2 indexing and hybrid retrieval
Phase 3 ranking, lifecycle, aliases, explainability
Phase 4 hooks, subagents, approval flows
Phase 5 only if needed

Expected final outcome

By the end of these phases, Claw should behave like:

a small default-context agent
with strong on-demand knowledge discovery
with good prioritization and freshness awareness
with explicit archive handling
with safer automation for maintenance
with clear paths for future scaling

It should feel less like “a giant pile of markdown and repos” and more like “a structured, self-maintaining workspace intelligence layer”.

Output requirements for this task

For whichever phases are being implemented or planned now, provide:

architecture summary
folder/schema changes
scripts/commands added
docs added
what was intentionally deferred
risks or ambiguities
recommended next step

Keep the implementation practical, local-first, and maintainable.

esafwan/Claw Plus Plus.md

Select an option

No results found

Select an option

No results found

Task: Build Kimi Claw into a scalable, retrieval-first workspace system

Objective: move from large flat workspace memory to a fast, maintainable, automation-friendly knowledge system

Core design principles

Phase 1 — Workspace simplification and memory restructuring

Goal

Tasks

Deliverables

Phase 2 — Retrieval, indexing, and routing

Goal

Tasks

Deliverables

Phase 3 — Intelligence layer on top of retrieval

Exclude for now: repo map generation

Goal

Tasks

Deliverables

Phase 4 — Automation layer: hooks, subagents, upkeep

Goal

Tasks

Deliverables

Phase 5 — Optional advanced upgrades later

Goal

Possible areas

Important note

Cross-phase constraints

Suggested implementation order

Expected final outcome

Output requirements for this task