EVOLV/.claude/rules/repo-mem.md

# repo-mem MCP Tools

This repo has a per-repo memory MCP server (`repo-mem`) wired via `.mcp.json`. It exposes 5 tools backed by a Hopfield substrate trained on EVOLV's source plus a BM25 index over file chunks. **Use them. They are faster and better-targeted than `grep` for concept queries, and they accumulate institutional memory of repairs.**

If `/mcp` does not list `repo-mem` as Connected, the rest of this file does not apply for this session — fall back to `grep` / `Read`.

## When to call which tool

### `repo_search(query, k=8)` — primary lookup tool
Use **before** `grep` / `find` / `Explore` agent for any natural-language "where is X handled / find all places that do Y / what code implements Z" question.

- ✅ "where is the predicted volume integrator?" → `repo_search`
- ✅ "find places that emit InfluxDB line protocol" → `repo_search`
- ❌ "find every occurrence of `_updatePredictedVolume`" → `grep` (exact symbol — BM25 doesn't beat grep at exact-string lookup)
- ❌ "list all `.test.js` files" → `find` / `ls` (no concept query)

Returns top-K files with `file:line` ranges and snippets. Read the snippet first; only open the file if the snippet doesn't answer the question.

### `repo_similar_fixes(query, failure?, files?, tags?, k=5)` — start-of-task context
Call at the **start** of any non-trivial bug fix or behavioral change. Cheap (BM25 + file overlap + atom cosine), zero downside if it returns nothing useful.

- Pass the user's task description as `query`.
- If there's a failing test or stack trace, pass it as `failure`.
- If you already know which files are involved, pass them as `files`.
- Skim the returned traces; surface any near-match to the user before starting.

### `repo_record_fix({task, failure, files, diff_summary, patch, tests, outcome, tags})` — end-of-task persist
Call at the **end** of a landed fix or behavioral change, **before** reporting completion to the user. Skip for trivial typo/comment commits. Required fields: `task` and `outcome`. Recommended:
- `failure`: the symptom that prompted the work (test output, user description, stack trace).
- `files`: the files actually changed.
- `diff_summary`: 1–3 sentences on *what* changed and *why*.
- `patch`: the unified diff (truncate to the load-bearing hunks if huge).
- `tests`: the verification command(s) you ran.
- `outcome`: `passed` / `failed` / `partial` / `reverted`.
- `tags`: short labels (`overflow-clamp`, `tokenizer`, `migration`, etc.) for retrieval bias.

Rule of thumb: if the change took more than one read+edit pair, record it.

### `substrate_score(text, worst_k=5)` — OOD-token check
Use **sparingly**. After generating a non-trivial code block (≥ ~30 lines of new logic, not test scaffolding), pass it through `substrate_score` and inspect the worst-confidence positions for typos, wrong identifiers, or out-of-house style. Noisy on small additions — don't use it for one-line tweaks.

### `substrate_top_next(context, k=10)` — rarely
Predicts next BPE-subword tokens in the local style. Mostly useful for autonomous solver loops; in interactive review it's diagnostic only. If you find yourself wanting it, you probably want `repo_search` instead.

## Workflow shape

```
new task arrives
  ↓
repo_similar_fixes(query=user_task)        ← cheap, always do this for non-trivial tasks
  ↓
repo_search(query=concept)                 ← when scoping
  ↓
[normal Read / Edit / Bash work]
  ↓
[after generating non-trivial new code]
substrate_score(text=new_block)            ← optional, only if block is big
  ↓
[verify: tests / build / smoke]
  ↓
repo_record_fix({...})                     ← before final user-facing summary
```

## Anti-patterns

- ❌ Calling `repo_search` when you already know the file path. Just `Read` it.
- ❌ Calling `repo_record_fix` after every micro-edit. Only at meaningful task boundaries.
- ❌ Treating `substrate_top_next` results as authoritative — they reflect repo style, not correctness.
- ❌ Passing the full conversation to `substrate_score` — it's per-snippet, not per-session.

## Refresh model

The post-commit hook auto-runs `--quick --lock` (re-ingest + BM25 + chunk re-embed; substrate retrain skipped) so retrieval stays current within ~2 s of any commit. The substrate itself is only retrained when you (or a maintainer) run `--full` manually:

```bash
node ~/anchor-net-master/tools/repo-mem/refresh.mjs \
  --repo . --in .repo-mem --full
```

Re-train when the repo gains substantially new vocabulary (new node, new domain, new dependency surface). Otherwise BM25 + existing atoms keep up.