chore(skills): add /research and /prototype; rewrite README for 6-skill chain

Front-loads gap discovery before /grill-me by adding two skills:

  research    MOSTLY  fans out Explore + WebSearch agents in parallel,
                      synthesizes findings into a brief, names open
                      unknowns explicitly (which become /prototype targets)

  prototype   MOSTLY  builds a throwaway spike to test ONE falsifiable
                      assumption; code lives in .prototypes/ (gitignored),
                      never promoted; output is evidence — verdict, numbers,
                      observed behavior — that feeds /prd

Full chain now:
  /research → /prototype → /grill-me → /prd → /prd-to-issues → /ship-it

Chain rationale: /research and /prototype surface knowledge gaps and falsify
risky assumptions while the cost of changing direction is still cheap; the
TOGETHER phases (grill-me, prd) lock down the contract; the AFK phase
(ship-it) only executes against contracts already on paper.

The chain is a default, not a mandate — README covers when to skip
upstream skills for small or stack-familiar work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
znetsixe
2026-05-21 16:33:26 +02:00
parent 6ff262e96e
commit fcaad8cd9f
3 changed files with 265 additions and 93 deletions

View File

@@ -0,0 +1,65 @@
---
name: prototype
description: Build a throwaway spike to falsify or confirm a single risky assumption. Code lives in .prototypes/ (gitignored) and is never promoted to the main codebase. Reports findings as evidence that feeds /prd. Use when the user invokes /prototype, says "spike X", "throwaway test for Y", "can we actually do Z" — typically after /research surfaces an Open Unknown.
---
# Prototype — throwaway spike
**Mode: MOSTLY TOGETHER.** The build and run go AFK, but the user picks the assumption to test and decides what the findings mean. The output is *evidence*, not production code.
You are now an engineer running a time-boxed spike to learn one thing. The point is to falsify or confirm an assumption fast — not to build a feature, not to produce code anyone will reuse.
## Hard rules
1. **One assumption per prototype.** If the user gives you two, ask which matters most; the other can be a second prototype.
2. **The assumption must be falsifiable.** "Will it be fast?" → no. "Can Node-RED sustain 1k msg/s to InfluxDB on the dev VM for 10 min?" → yes. If the user's claim isn't falsifiable, refuse and ask for a sharper one before building anything.
3. **Throwaway means throwaway.** Code lives in `<repo-root>/.prototypes/<short-name>/` only. The directory is gitignored (add it as the first step if it isn't). Nothing in `.prototypes/` is ever committed to the main codebase. No exceptions.
4. **Time-box.** Default budget: 30 minutes of work and ≤200 LOC. If the user gave a different budget, use that. When you blow through, stop and report whatever you've got.
## Steps
1. **Restate the assumption** in falsifiable form. Show it to the user. Wait one turn for confirmation or correction — this is the only mid-skill checkpoint.
2. **Pick the minimum viable test.** Options:
- **Code spike** — throwaway script that exercises the question. Most common.
- **Reading spike** — deep read of a library/spec/codebase, no code. Use when the question is "does X support Y" and the docs would tell you.
- **Manual integration spike** — run a command, hit an endpoint, observe. Use when the question is about a real service's behavior.
3. **Set up the dir.**
```bash
ROOT=$(git rev-parse --show-toplevel)
mkdir -p "$ROOT/.prototypes/<name>"
grep -qxE '\.prototypes/?' "$ROOT/.gitignore" 2>/dev/null || echo '.prototypes/' >> "$ROOT/.gitignore"
```
4. **Build the smallest thing that tests the assumption.** Resist polish. No tests on the prototype itself, no error handling, no docs, no abstractions. Hardcode values. Inline everything.
5. **Run it.** Capture output. If it crashes in a way that's *about* the assumption (e.g. memory blows up at 1k msg/s), that's a finding — not a bug to fix.
6. **Iterate up to the budget.** If a quick adjustment sharpens the test, make it. If you're tempted to refactor or expand scope, stop and report instead.
7. **Report findings.** In chat, using this structure:
```
# Prototype findings: <assumption>
**Verdict:** confirmed | falsified | inconclusive
**Budget used:** <e.g. 22 min, 140 LOC>
## What I did
<23 sentences. What the spike actually exercised.>
## Evidence
<concrete output, numbers, logs, observed behavior. Paste the relevant snippet.>
## What this changes in our mental model
<one paragraph — what we believed before vs. what we believe now>
## Recommended next step
<one sentence — usually /prd, sometimes another /prototype, sometimes "kill this idea">
## Prototype location (do not import)
.prototypes/<name>/
```
## What not to do
- **Don't promote the prototype.** Even if it works beautifully. The next phase is `/prd` → `/prd-to-issues` → `/ship-it` implementing the real thing in production code — not adapting the spike.
- **Don't polish.** Tests, types, lint-clean, comments — none of it. The code is disposable.
- **Don't expand scope.** "Since I'm here, I'll also test…" — no. File the second question for a separate prototype.
- **Don't commit `.prototypes/`.** Ever. If you find yourself wanting to share the prototype, share the *findings*, not the code.
- **Don't ask the user mid-build.** If the assumption was underspecified, you should have caught that in step 1. Once running, run.