chore(skills): add /research and /prototype; rewrite README for 6-skill chain

Front-loads gap discovery before /grill-me by adding two skills:

  research    MOSTLY  fans out Explore + WebSearch agents in parallel,
                      synthesizes findings into a brief, names open
                      unknowns explicitly (which become /prototype targets)

  prototype   MOSTLY  builds a throwaway spike to test ONE falsifiable
                      assumption; code lives in .prototypes/ (gitignored),
                      never promoted; output is evidence — verdict, numbers,
                      observed behavior — that feeds /prd

Full chain now:
  /research → /prototype → /grill-me → /prd → /prd-to-issues → /ship-it

Chain rationale: /research and /prototype surface knowledge gaps and falsify
risky assumptions while the cost of changing direction is still cheap; the
TOGETHER phases (grill-me, prd) lock down the contract; the AFK phase
(ship-it) only executes against contracts already on paper.

The chain is a default, not a mandate — README covers when to skip
upstream skills for small or stack-familiar work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
znetsixe
2026-05-21 16:33:26 +02:00
parent 6ff262e96e
commit fcaad8cd9f
3 changed files with 265 additions and 93 deletions

View File

@@ -1,132 +1,127 @@
# Workflow skills — grill-me → prd → prd-to-issues → ship-it # Workflow skills — research → prototype → grill-me → prd → prd-to-issues → ship-it
A four-skill chain that takes a vague feature idea from "I want to build X" to merged, end-to-end-verified code. Two phases are collaborative (you + Claude), two run autonomously (AFK). A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract.
``` ```
/grill-me <topic> TOGETHER pressure-test the idea /research <topic> MOSTLY external + repo knowledge into a brief
/prd TOGETHER synthesize a PRD; gaps stay explicit /prototype <claim> MOSTLY throwaway spike to test the riskiest assumption
/prd-to-issues MOSTLY thin vertical-slice issues; file when you say "create" /grill-me <topic> TOGETHER pressure-test what survived
/ship-it AFK shell loop ships every slice end-to-end /prd TOGETHER synthesize PRD; gaps stay explicit
/prd-to-issues MOSTLY thin vertical-slice issues; file on "create"
/ship-it AFK shell loop ships every slice end-to-end
``` ```
## The TOGETHER vs AFK distinction You don't have to use every skill on every feature. Small tweaks may skip `/research` and `/prototype`. Bigger / novel work uses the whole chain.
## Mode taxonomy
| Mode | Meaning | Skills | | Mode | Meaning | Skills |
|---|---|---| |---|---|---|
| **TOGETHER** | Needs your judgment every turn. No autonomous path. | `grill-me` | | **TOGETHER** | Needs your turn-by-turn judgment. No autonomous path. | grill-me |
| **MOSTLY TOGETHER** | Drafts/audits AFK, but a visible-to-team action needs your "go". | `prd`, `prd-to-issues` | | **MOSTLY TOGETHER** | Drafts / fetches / builds AFK. You review the output. Any visible-to-team action needs your explicit "go". | research, prototype, prd, prd-to-issues |
| **AFK** | No human in the loop. Logs questions to issues instead of asking. | `ship-it` | | **AFK** | No human in the loop. Logs questions to issues instead of asking. | ship-it |
Each skill's `SKILL.md` declares its mode in the first body line. The chain is designed so the AFK phase only starts after the human-locked phases have nailed down the contract (PRD requirements, slice acceptance criteria) — AFK code only ships against decisions that already exist on paper. The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts.
--- ---
## When to use each ## When to use each
### `/research <topic>` — MOSTLY TOGETHER
Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for `/prototype`).
**Use when:** the topic touches anything you haven't done before in this codebase. Novel libraries, unfamiliar patterns, "how do others solve this".
**Don't use it for:** stuff you already understand. The point is to fetch what you don't know, not summarize what you do.
### `/prototype <claim>` — MOSTLY TOGETHER
Builds a throwaway spike to test ONE falsifiable assumption. Code lives in `.prototypes/` (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD.
**Use when:** `/research` surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature.
**Don't use it for:** building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike.
### `/grill-me <topic>` — TOGETHER ### `/grill-me <topic>` — TOGETHER
Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say `stop` for an honest 3-bullet debrief.
**Use when:** you have a fuzzy idea and want it pressure-tested before committing to scope. Also useful for interview prep on any technical topic. **Use when:** `/research` and `/prototype` (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD.
**Behavior:** acts as a senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots, demands specifics over buzzwords. Stay on the topic until exhausted; say `stop` for an honest 3-bullet debrief. **Don't use it for:** rubber-stamping a finished idea, or when you want validation. Designed to find gaps, not to agree.
**Don't use it for:** rubber-stamping a finished idea, or when you want validation. It's designed to find gaps, not to agree with you.
### `/prd` — TOGETHER (drafts AFK after grilling) ### `/prd` — TOGETHER (drafts AFK after grilling)
Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.
**Use when:** the grilling exposed enough that you're ready to lock down what's being built. Or standalone when you already know the feature shape. **Use when:** the grilling exposed enough that the feature shape is clear. Or standalone when you already have full context.
**Behavior:** writes an engineering PRD (Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope). Things you nailed in the grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over. **Don't use it for:** strategy decks, market sizing, "why now". This is for engineering.
**Output:** inline by default. Say "save it" → writes to `docs/prd/<name>.md`.
**Don't use it for:** strategy docs, market sizing, or "why now" sections. This is for engineering.
### `/prd-to-issues` — MOSTLY TOGETHER ### `/prd-to-issues` — MOSTLY TOGETHER
Breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue `Slice check` block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** the draft is shown to you.
**Use when:** you have a PRD (just-drafted or path-pointed) and need a backlog of issues a team can pick up. **Output:** draft inline → you reply `create` → files to the tracker (`gh` for GitHub, `tea` for Gitea).
**Behavior:** breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). The first slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Each issue has a visible `Slice check` block proving every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** drafting is shown to you. **Don't use it for:** horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing.
**Output:** draft inline → you reply `create` → it files to the tracker (`gh` for GitHub, `tea` for Gitea).
**Don't use it for:** horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing — if your work needs that, you want a different tool.
### `/ship-it` — AFK ### `/ship-it` — AFK
Shell loop in `.claude/skills/ship-it/loop.sh`. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime.
**Use when:** issues are filed, you want to walk away, you'll review the PRs later.
**Behavior:** runs a shell loop that picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with checked acceptance criteria + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to the terminal; you can `tail -f` the log from another shell or Ctrl-C anytime.
Undecidable issues get labeled `needs-decision` and skipped. Three consecutive failures stops the loop for human review. Undecidable issues get labeled `needs-decision` and skipped. Three consecutive failures stops the loop for human review.
**Don't use it for:** issues whose acceptance criteria aren't testable (the loop will skip them), or for slices that need real-world side effects with no test harness. **Don't use it for:** issues whose acceptance criteria aren't testable. The loop will skip them.
--- ---
## A worked example ## A worked example
You want to add live sensor display for a new flow meter on a dashboard. Adding live sensor display for a new flow meter to the operator dashboard.
```bash ```bash
# 1. Pressure-test the idea # 1. Fetch what we don't already know
/research adding live flow-meter readings to the operator dashboard
# Brief lands; surfaces an Open Unknown:
# "Can Node-RED sustain 1Hz updates across 12 dashboard panels for 10 min
# straight without dropping frames?"
# 2. Test the risky assumption
/prototype Node-RED can stream 1Hz updates to 12 Grafana panels for 10 min straight
# Spike runs in .prototypes/nodered-throughput/;
# Verdict: confirmed, 14% CPU peak. Evidence captured. Prototype stays gitignored.
# 3. Pressure-test the design
/grill-me adding live flow-meter readings to the operator dashboard /grill-me adding live flow-meter readings to the operator dashboard
# 68 hard questions; surfaces gaps in alerting and missing-data handling.
# (you answer 6-8 hard questions; gaps surface) # 4. Lock down the contract
# 2. Lock down the contract
/prd /prd
# PRD drafts with the alerting decision as a firm requirement and the
# missing-data behavior as an explicit Open Question.
# (PRD drafts; you edit one section; save it) # 5. Slice it
# 3. Slice it
/prd-to-issues /prd-to-issues
# 5 slices; coverage matrix confirms every PRD requirement maps to a slice.
# Reply `create` → issues #142..#146 filed.
# (draft shows ~5 slices, each with a Slice check block. Coverage matrix # 6. Walk away
# confirms every PRD requirement maps to a slice. Reply `create`.)
# → issues #142..#146 filed
# 4. Walk away
/ship-it /ship-it
# Preflight, plan, "Start? Reply `go`." → `go` → shell loop runs.
# (preflight, plan, "Start? Reply `go`." → `go` → shell loop runs) # Another terminal: tail -f .ship-it-logs/run-*.log
# In another terminal: tail -f .ship-it-logs/run-*.log
``` ```
After the loop exits, the summary tells you what shipped, what's open for review, what hit `needs-decision`. After ship-it exits, the summary tells you what shipped, what's open for review, what hit `needs-decision`.
--- ## Skipping skills
## Configuration The chain is a default, not a mandate:
### Repo trunk branch - **Tiny well-understood change:** straight to `/prd-to-issues` (or file the issue by hand and run `/ship-it`).
- **Bigger but stack-familiar:** skip `/research` and `/prototype`; start at `/grill-me`.
`ship-it` defaults to `main`. If your repo uses something else: - **Pure research, no implementation yet:** stop after `/research` or `/prototype` — the brief or findings are the deliverable.
- **Existing PRD from somewhere else:** `/prd-to-issues <path>` and go.
```bash
SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh
```
### Other env vars
| Var | Default | Purpose |
|---|---|---|
| `SHIP_IT_MAX` | 50 | Iteration cap |
| `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
| `SHIP_IT_TIMEOUT` | 30m | Per-issue timeout |
| `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Log directory |
### Tracker support
- **GitHub** — uses `gh` CLI (must be installed and authenticated: `gh auth status`).
- **Gitea** — uses `tea` CLI (must be installed: `go install code.gitea.io/tea@latest`, then `tea login add`).
- Auto-detected from `git remote get-url origin`.
### Issue label expected by `ship-it`
The loop filters to open issues with label `slice` and without labels `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies the `slice` label by default. If you file issues by hand, add the label or `ship-it` won't pick them up.
--- ---
@@ -134,7 +129,11 @@ The loop filters to open issues with label `slice` and without labels `blocked`,
``` ```
.claude/skills/ .claude/skills/
├── README.md ← this file ├── README.md ← this file
├── research/
│ └── SKILL.md
├── prototype/
│ └── SKILL.md
├── grill-me/ ├── grill-me/
│ └── SKILL.md │ └── SKILL.md
├── prd/ ├── prd/
@@ -142,39 +141,77 @@ The loop filters to open issues with label `slice` and without labels `blocked`,
├── prd-to-issues/ ├── prd-to-issues/
│ └── SKILL.md │ └── SKILL.md
└── ship-it/ └── ship-it/
├── SKILL.md ← entry point; chat-side bootstrap ├── SKILL.md ← entry point; chat-side bootstrap
├── loop.sh ← orchestrator (the actual loop) ├── loop.sh ← orchestrator (the actual loop)
└── iterate.md ← per-issue prompt the loop dispatches └── iterate.md ← per-issue prompt the loop dispatches
.prototypes/ ← throwaway spike code (gitignored, created by /prototype)
.ship-it-logs/ ← ship-it loop logs (recommend gitignoring)
docs/
├── research/ ← saved research briefs ("save it" in /research)
└── prd/ ← saved PRDs ("save it" in /prd)
``` ```
--- ---
## Configuration
### ship-it tracker support
- **GitHub** — `gh` CLI required (`gh auth status`)
- **Gitea** — `tea` CLI required (`go install code.gitea.io/tea@latest && tea login add`)
- Auto-detected from `git remote get-url origin`
### ship-it env vars
| Var | Default | Purpose |
|---|---|---|
| `SHIP_IT_TRUNK` | `main` | Trunk branch (set to `development` for the EVOLV repo) |
| `SHIP_IT_MAX` | 50 | Iteration cap |
| `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
| `SHIP_IT_TIMEOUT` | 30m | Per-issue timeout |
| `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Log directory |
Example for EVOLV:
```bash
SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh
```
### Issue label expected by ship-it
The loop filters to open issues with label `slice` and without `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies `slice` by default. If you file issues by hand, add the label or ship-it won't pick them up.
---
## Troubleshooting ## Troubleshooting
**`ship-it` won't start: "tea CLI not installed".** **`ship-it` won't start: "tea CLI not installed".**
The repo's remote is Gitea but you don't have `tea`. Install it (`go install code.gitea.io/tea@latest && tea login add`) or push to a GitHub mirror. Repo remote is Gitea but you don't have `tea`. Install it (`go install code.gitea.io/tea@latest && tea login add`) or push to a GitHub mirror.
**`ship-it` exits immediately: "git tree is dirty".** **`ship-it` exits immediately: "git tree is dirty".**
Commit or stash everything before running. The loop won't risk mixing your work-in-progress into a slice. Commit or stash before running. The loop won't risk mixing WIP into a slice.
**`ship-it` says "backlog empty" but I have open issues.** **`ship-it` says "backlog empty" but I have open issues.**
The filter requires label `slice` AND no `blocked`/`needs-decision`/`ci-failed`. Add the label, or check what's blocking. Filter requires label `slice` AND none of `blocked` / `needs-decision` / `ci-failed`. Check labels.
**An issue keeps getting `needs-decision`.** **An issue keeps getting `needs-decision`.**
Its acceptance criteria probably aren't testable at the outermost layer. Open the issue, rewrite criteria so they're observable (e.g. "POST /x returns 201 and row appears on dashboard" not "feature works"), remove the label, the loop will pick it up next run. Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun.
**Three failures in a row, loop stopped.** **`/prototype` keeps wanting to "tidy up" the spike before reporting.**
Something systemic — bad dependency, branch protection blocking, flaky test env. Check `.ship-it-logs/iter-*.log` for the per-issue detail. That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now."
**Issue had a PR opened but didn't merge.** **`/research` returns shallow results.**
Branch protection requires human review. The loop reports it as "shipped → open for review" and moves on. Review and merge when you can. The decomposed questions were too broad. Ask it to redo with a tighter scope, or constrain ("only this repo" / "only Node-RED + InfluxDB stack").
**`/prd-to-issues` drafts look like layer cake.**
Stop, say "reslice — these are horizontal." The skill's self-audit should catch this, but if it doesn't, push back explicitly.
--- ---
## Design principles ## Design principles
- **Vertical slices, always.** No "implement the backend first, then the frontend". Every issue exercises every layer. - **Front-load gap discovery.** Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time.
- **Gaps are explicit, never hidden.** Hedged answers in grilling → Open Questions in PRD → `needs-decision` issues, not silent assumptions. - **Gaps are explicit, never hidden.** Open Unknowns in `/research` → spike claims in `/prototype` → Open Questions in PRD → `needs-decision` labels on issues. Nothing gets papered over.
- **AFK only after the contract is locked.** Autonomous code only ships against decisions that already exist on paper. - **Vertical slices, always.** No "implement the backend first". Every slice exercises every layer.
- **AFK only after the contract is locked.** Autonomous code only runs against decisions already on paper.
- **Throwaway means throwaway.** Prototypes are evidence; the real implementation in production code happens fresh in `/ship-it`.
- **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped.
- **One commit per slice.** Small, reviewable, revertible. - **One commit per slice.** Small, reviewable, revertible.
- **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms the user-observable behavior actually works before reporting shipped.

View File

@@ -0,0 +1,65 @@
---
name: prototype
description: Build a throwaway spike to falsify or confirm a single risky assumption. Code lives in .prototypes/ (gitignored) and is never promoted to the main codebase. Reports findings as evidence that feeds /prd. Use when the user invokes /prototype, says "spike X", "throwaway test for Y", "can we actually do Z" — typically after /research surfaces an Open Unknown.
---
# Prototype — throwaway spike
**Mode: MOSTLY TOGETHER.** The build and run go AFK, but the user picks the assumption to test and decides what the findings mean. The output is *evidence*, not production code.
You are now an engineer running a time-boxed spike to learn one thing. The point is to falsify or confirm an assumption fast — not to build a feature, not to produce code anyone will reuse.
## Hard rules
1. **One assumption per prototype.** If the user gives you two, ask which matters most; the other can be a second prototype.
2. **The assumption must be falsifiable.** "Will it be fast?" → no. "Can Node-RED sustain 1k msg/s to InfluxDB on the dev VM for 10 min?" → yes. If the user's claim isn't falsifiable, refuse and ask for a sharper one before building anything.
3. **Throwaway means throwaway.** Code lives in `<repo-root>/.prototypes/<short-name>/` only. The directory is gitignored (add it as the first step if it isn't). Nothing in `.prototypes/` is ever committed to the main codebase. No exceptions.
4. **Time-box.** Default budget: 30 minutes of work and ≤200 LOC. If the user gave a different budget, use that. When you blow through, stop and report whatever you've got.
## Steps
1. **Restate the assumption** in falsifiable form. Show it to the user. Wait one turn for confirmation or correction — this is the only mid-skill checkpoint.
2. **Pick the minimum viable test.** Options:
- **Code spike** — throwaway script that exercises the question. Most common.
- **Reading spike** — deep read of a library/spec/codebase, no code. Use when the question is "does X support Y" and the docs would tell you.
- **Manual integration spike** — run a command, hit an endpoint, observe. Use when the question is about a real service's behavior.
3. **Set up the dir.**
```bash
ROOT=$(git rev-parse --show-toplevel)
mkdir -p "$ROOT/.prototypes/<name>"
grep -qxE '\.prototypes/?' "$ROOT/.gitignore" 2>/dev/null || echo '.prototypes/' >> "$ROOT/.gitignore"
```
4. **Build the smallest thing that tests the assumption.** Resist polish. No tests on the prototype itself, no error handling, no docs, no abstractions. Hardcode values. Inline everything.
5. **Run it.** Capture output. If it crashes in a way that's *about* the assumption (e.g. memory blows up at 1k msg/s), that's a finding — not a bug to fix.
6. **Iterate up to the budget.** If a quick adjustment sharpens the test, make it. If you're tempted to refactor or expand scope, stop and report instead.
7. **Report findings.** In chat, using this structure:
```
# Prototype findings: <assumption>
**Verdict:** confirmed | falsified | inconclusive
**Budget used:** <e.g. 22 min, 140 LOC>
## What I did
<23 sentences. What the spike actually exercised.>
## Evidence
<concrete output, numbers, logs, observed behavior. Paste the relevant snippet.>
## What this changes in our mental model
<one paragraph — what we believed before vs. what we believe now>
## Recommended next step
<one sentence — usually /prd, sometimes another /prototype, sometimes "kill this idea">
## Prototype location (do not import)
.prototypes/<name>/
```
## What not to do
- **Don't promote the prototype.** Even if it works beautifully. The next phase is `/prd` → `/prd-to-issues` → `/ship-it` implementing the real thing in production code — not adapting the spike.
- **Don't polish.** Tests, types, lint-clean, comments — none of it. The code is disposable.
- **Don't expand scope.** "Since I'm here, I'll also test…" — no. File the second question for a separate prototype.
- **Don't commit `.prototypes/`.** Ever. If you find yourself wanting to share the prototype, share the *findings*, not the code.
- **Don't ask the user mid-build.** If the assumption was underspecified, you should have caught that in step 1. Once running, run.

View File

@@ -0,0 +1,70 @@
---
name: research
description: Gather external knowledge and codebase context for a topic before committing to a direction. Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly. Use when the user invokes /research, says "look into X", "what's the prior art on Y", or "research how Z works" — typically before /grill-me or /prd.
---
# Research — fetch knowledge into a brief
**Mode: MOSTLY TOGETHER.** The fetching and synthesis run AFK (Agent subagents do the legwork). The brief lands in chat; you decide what's worth pursuing. No external state is changed.
You are now a senior engineer doing a focused research pass. Goal: enough knowledge to make a good `/prd` decision later — no more. Do not write code, do not pick a winner, do not write the PRD. Lay out what's known, what's available, and what's still unknown.
## Inputs
The user names a topic. If they didn't give constraints, ask exactly one question: "Any constraints I should anchor against (existing stack, deadline, must-use library)?" Then proceed.
## How to research
1. **Decompose the topic into 35 specific questions.** Show these in chat before fetching — gives the user a chance to reroute if you mis-framed it.
2. **Fan out in parallel** using the Agent tool. Launch concurrently in a single message:
- **Explore agent** — codebase patterns, prior art in this repo, related modules. Question: "Does this repo already do something like X? Where? What patterns does it use?"
- **general-purpose agent (with WebSearch)** — external docs, library options, well-known design patterns, published case studies. Question: "What are the established approaches to Y? What libraries handle Z?"
- Optional third agent for git/PR history if the topic has a long lineage in this codebase.
3. **Synthesize, don't dump.** When agents report back, write a brief — not a transcript.
## Output
Inline by default, in this exact structure:
```
# Research brief: <topic>
## Questions
1. <decomposed question>
2. ...
## What's already in this codebase
- <finding> (path/to/file.ts:42)
- ...
## External options
- **<option>** — <one-line eval. when it fits, when it doesn't>
- ...
## Prior art
- <link> — <one-line takeaway>
- ...
## Open unknowns
- <thing no source can answer; candidate for /prototype>
- ...
## Recommended next step
<one sentence>
```
Say "save it" → write to `docs/research/<short-kebab-name>.md`.
## Quality bar
- Specific over comprehensive. A 1-page brief that surfaces the real decision beats a 5-page survey.
- Cite sources for every claim. `file:line` for codebase, URL for external. No floating assertions.
- Name what you don't know. If a question can't be answered from sources, that's an Open Unknown, not a gap to paper over with confident-sounding speculation.
- Don't recommend a winner among external options. Surface tradeoffs; `/prd` picks.
## What not to do
- Don't write code. Not even illustrative snippets. The output is a brief, not a sketch.
- Don't open files yourself to skim — let the Explore agent do that. Synthesizing is your job.
- Don't fabricate. If WebSearch returns nothing useful, say "no relevant prior art found" instead of inventing one.
- Don't make product decisions. "Should we use X or Y?" → both, with tradeoffs, then "your call."