chore(skills): add workflow chain — grill-me → prd → prd-to-issues → ship-it

Four workflow skills that take a feature from fuzzy idea to merged code. Two human-in-the-loop phases (grill-me, prd), one mostly-together (prd-to-issues files only on explicit 'create'), and one AFK (ship-it). grill-me TOGETHER pressure-test the idea with hard interview questions prd TOGETHER synthesize PRD; gaps stay explicit, not papered over prd-to-issues MOSTLY thin vertical-slice issues with coverage matrix + per-issue Slice check; self-audits before showing ship-it AFK shell loop ships each slice end-to-end with one commit per issue, status streams to terminal, Ctrl-C-able, survives session close Vertical-slice principle throughout: every issue cuts end-to-end through every integration layer (no horizontal "do all the DB work first" issues). The AFK loop only ships against acceptance criteria already locked in by the PRD phase — autonomous code never runs against undefined contracts. ship-it tracker support: gh (GitHub) and tea (Gitea). For this repo, set SHIP_IT_TRUNK=development to override the main default. See .claude/skills/README.md for the full how-to and a worked example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:27:15 +02:00
parent 025bdb4c7e
commit 6ff262e96e
7 changed files with 825 additions and 0 deletions
--- a/.claude/skills/ship-it/SKILL.md
+++ b/.claude/skills/ship-it/SKILL.md
@@ -0,0 +1,115 @@
+---
+name: ship-it
+description: AFK autopilot. Drives a shell loop that works through every ready issue in the tracker (GitHub via gh, Gitea via tea), implementing each vertical slice end-to-end and committing per issue. Status streams to the terminal so the human can tail progress locally and Ctrl-C anytime. The shell is the loop; each iteration dispatches one fresh headless Claude run to ship one issue. Use when the user invokes /ship-it, says "go AFK on this", "work the backlog", "ralph the issues", or "ship everything".
+---
+
+# Ship It — AFK backlog autopilot
+
+**Mode: AFK.** No human in the loop. Does not ask questions mid-run. If a slice is undecidable, the iteration labels the issue `needs-decision` and the loop moves on. The human gets one summary at the end, not chatter during.
+
+## How this works (read before invoking)
+
+The actual loop runs in a shell script: `.claude/skills/ship-it/loop.sh`. **The shell is the loop**, not you. Each iteration shells out to a fresh, headless `claude -p` invocation that processes exactly one issue using `.claude/skills/ship-it/iterate.md` as its prompt. Three reasons this design beats "LLM keeps going inside one session":
+
+1. **Fresh context per issue.** No drift, no accumulated history bloating the window.
+2. **Visible in the terminal.** Progress streams to stdout and tees to a log file. The human can tail it from another shell, see commits land, and Ctrl-C cleanly.
+3. **Survives session close.** Closing the interactive Claude window doesn't kill the loop. Re-attach by tailing the log.
+
+## Files
+
+- `loop.sh` — orchestrator. Tracker detection, preflight, dispatch loop, status output, stop conditions, summary.
+- `iterate.md` — the prompt passed to each per-issue headless Claude. Read it; it defines what "shipped" means.
+- `SKILL.md` — this file. When the user invokes `/ship-it`, you bootstrap and hand off.
+
+## When the user invokes /ship-it
+
+You (the interactive Claude) do the bootstrap, not the work. Concretely:
+
+1. **Preflight in chat** (catches the obvious failures before the script runs):
+   - `git status --porcelain` empty?
+   - On `main` (or `$SHIP_IT_TRUNK`)? Up-to-date with origin?
+   - `gh auth status` (or tea token) returns 0?
+   - `gh issue list --state open --label slice | wc -l` ≥ 1?
+2. **Show the plan** in one short block: tracker host, trunk branch, count of ready issues, the first 3 issue titles, the log path. Nothing more.
+3. **Ask one question:** "Start? Reply `go`." This is the *only* human-in-the-loop checkpoint — kicking off AFK work is a real commitment, deserves an explicit ok.
+4. **On `go`:** run the loop in the foreground so the user sees live output:
+   ```
+   bash .claude/skills/ship-it/loop.sh
+   ```
+   Do not background it. Do not pipe through anything that buffers. The user can Ctrl-C.
+5. **While it runs:** stay silent. Don't interject. Don't "monitor" by re-reading logs in chat — the user has the terminal.
+6. **When it exits:** read the final `==== ship-it summary ====` block from the log file, present it once with concrete next steps ("2 issues are `needs-decision` — open them to answer their questions?").
+
+## Following progress
+
+The script logs to stdout AND tees to `.ship-it-logs/run-<RUN_ID>.log`. Tail from another terminal:
+
+```bash
+tail -f .ship-it-logs/run-*.log
+```
+
+Per-issue detail (everything the headless Claude did for that one issue) is in `.ship-it-logs/iter-<RUN_ID>-<ISSUE>.log` — useful for debugging a failed iteration.
+
+Commits land in git as the loop runs. Watch with:
+
+```bash
+watch -n 5 'git log --oneline -10 origin/main'
+```
+
+## Config (env vars, override before invoking)
+
+| Var | Default | Purpose |
+|---|---|---|
+| `SHIP_IT_MAX` | 50 | Hard cap on iterations per run |
+| `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
+| `SHIP_IT_TRUNK` | `main` | Trunk branch name |
+| `SHIP_IT_TIMEOUT` | `30m` | Per-issue timeout (kills the headless claude) |
+| `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Where logs go |
+
+## What each iteration does (per `iterate.md`)
+
+For one issue: read it → branch from trunk → write failing e2e test at the outermost layer → implement layer by layer until the test passes → run the full suite → outermost-layer smoke check → commit (one commit, message ends `Closes #N`) → push → open PR with acceptance-criteria checkboxes + smoke evidence → wait for CI → merge if green and branch protection allows, else leave open for review → return to trunk → emit `ITERATION_RESULT:` line for the loop.
+
+**Commit per issue:** yes, exactly. One commit per slice, referenced to the issue, lands on the branch before the PR opens. The slice scope was made small in `/prd-to-issues` precisely so this is one tight commit, not a series.
+
+## Stop conditions (in priority order)
+
+1. **User Ctrl-C** → trap catches SIGINT, current step finishes cleanly, summary prints, exit 130.
+2. **Backlog empty** (no ready issues) → exit 0.
+3. **Three consecutive hard failures** → exit 1. Something systemic — bad dependency, branch protection blocking, flaky env. Surfaces for human review.
+4. **Precondition violated mid-run** → exit non-zero with reason.
+
+## What "ready" means (the loop's filter)
+
+An issue is `ready` iff:
+- State is open
+- Has label `slice` (filed by `/prd-to-issues`)
+- Does NOT have label `blocked`, `needs-decision`, or `ci-failed`
+- Is not a spike (spikes deliver decisions, not code — humans handle those)
+
+Issues are processed in number order — walking-skeleton first, as `/prd-to-issues` ordered them.
+
+## Safety boundaries
+
+The headless Claude is launched with a tool allowlist that excludes destructive operations. It cannot:
+
+- Force-push or rewrite shared history
+- Bypass branch protection or skip CI hooks (`--no-verify`, `--admin`)
+- Auto-merge red or pending PRs (the iterate prompt forbids it, and CI gates back it up)
+- Modify CI/CD config or IaC unless the slice's `Slice — layers touched` line explicitly names that layer
+- Close issues without the outermost-layer smoke check passing
+- Assign people or change milestones/projects
+
+If something tries to push past these in practice (e.g. a slice "needs" a CI change to pass), it should fail the iteration with `needs-decision` and let a human approve the scope expansion.
+
+## What not to do
+
+- **Don't drive the loop yourself by reading issues and implementing them inline.** The shell is the loop. If you're tempted to "just do this one in chat," stop and run the script.
+- **Don't background the script** so the user can keep chatting with you. The output IS the value. The user wants to watch it work.
+- **Don't summarize between iterations.** Chatter belongs in the final summary, not after each commit.
+- **Don't tag the user in PR/issue comments** during the run. They're not in the loop until the script exits.
+- **Don't restart a failed iteration manually.** The loop's `needs-decision` and `ci-failed` labels are how failures stay in the tracker for human triage. Manual restart skips that.
+
+## How this fits the chain
+
+`/grill-me <feature>` (together) → `/prd` (together) → `/prd-to-issues` (mostly together, file step needs `create`) → `/ship-it` (AFK). The four-skill arc takes a vague feature idea to merged code with one human checkpoint per phase boundary.
--- a/.claude/skills/ship-it/iterate.md
+++ b/.claude/skills/ship-it/iterate.md
@@ -0,0 +1,70 @@
+# ship-it iterate — one issue, end-to-end
+
+You are running ONE iteration of the ship-it AFK loop. Implement, verify, and ship exactly one issue, then exit. The outer shell loop will pick the next one.
+
+**Mode: AFK.** Do not ask questions. If the issue is genuinely undecidable from its body + linked PRD + grilling notes already in the issue or repo, drop a comment on the issue with the specific question, label it `needs-decision`, and exit with status=needs-decision. Do not guess at user intent.
+
+Variables provided below this prompt: `ISSUE_NUMBER`, `TRACKER_CLI` (`gh` or `tea`), `TRUNK_BRANCH`, `REPO_ROOT`.
+
+## Steps
+
+1. **Read the issue.**
+   - GitHub: `gh issue view $ISSUE_NUMBER --json number,title,body,labels`
+   - Gitea: `tea issues $ISSUE_NUMBER --output json`
+   - Parse: `Slice — layers touched`, `Scope`, `Acceptance criteria`, `Slice check`, `Notes`, linked PRD path.
+   - If `Acceptance criteria` is missing or non-testable → exit status=needs-decision with reason "acceptance criteria not testable".
+
+2. **Branch from latest trunk.**
+   `git fetch origin && git switch -c "slice/${ISSUE_NUMBER}-<short-kebab-slug>" "origin/$TRUNK_BRANCH"`
+
+3. **Write the failing e2e test first.** Anchored at the OUTERMOST layer named in `Slice — layers touched` (HTTP endpoint, UI smoke, dashboard query, log assertion — whatever the acceptance criterion observes). Run it. Confirm it fails for the right reason. If you can't write an e2e test for this slice, that's a sign the acceptance criterion isn't really observable end-to-end → exit status=needs-decision.
+
+4. **Implement layer by layer.** Walk the `Slice — layers touched` list. Make the minimal change at each layer to satisfy the slice — do not gold-plate, do not refactor adjacent code, do not "improve" things outside scope. Re-run the e2e test after each layer change.
+
+5. **Run the broader test suite.** Catch regressions caused by the slice. Fix any test that was green before and is now red — do not skip or mark tests. If a test was already red before your changes, leave it (note in PR body).
+
+6. **Outermost-layer smoke check.** The 30-second-demo check: hit the endpoint with curl, query the dashboard, tail the log, load the page. Observe what the acceptance criterion observes. Capture the output (curl response body, log snippet, query result) — you'll paste it into the PR body as evidence.
+
+7. **Commit.** One commit per slice (or a tight series — no WIP commits, no fixup commits, no "address review" before review exists). Read the repo's recent `git log` to match commit style. Message ends with `Closes #${ISSUE_NUMBER}`.
+
+8. **Push and open PR.**
+   - GitHub: `git push -u origin HEAD && gh pr create --fill`
+   - Gitea: `git push -u origin HEAD && tea pr create --title "..." --description "..."`
+   - PR body must include:
+     - Each acceptance criterion as a checked `- [x]` line.
+     - The smoke-check evidence (curl output / log snippet / screenshot path) in a fenced block.
+     - `Closes #${ISSUE_NUMBER}` (so the issue auto-closes on merge).
+
+9. **Wait for CI and decide merge.**
+   - Poll: `gh pr checks --watch` (or `tea pr status`).
+   - **All green + branch protection allows direct merge** → `gh pr merge --squash --delete-branch`. Verify the merge commit landed on trunk.
+   - **All green + branch protection requires human review** → leave PR open. Comment `Ready for review — all acceptance criteria verified, smoke check passed.` on the issue. Exit status=shipped with the PR number.
+   - **Red CI** → one fix-and-push cycle. Read the failing log, fix the actual cause (do not skip the test). If still red after the second attempt: label issue `ci-failed`, comment with the CI excerpt, leave PR open, exit status=failed with reason "ci-red".
+
+10. **Return to trunk.** `git switch $TRUNK_BRANCH && git pull --ff-only`. If the slice was merged, run the smoke check one more time against integrated trunk. If it fails there → revert the merge, label `regression`, exit status=failed with reason "regression-on-trunk".
+
+## Boundaries
+
+- Never force-push, never rewrite shared history, never delete branches you didn't create.
+- Never bypass branch protection (`--admin`) or skip CI hooks (`--no-verify`).
+- Never auto-merge a PR whose CI is red or pending.
+- Never close an issue without the outermost-layer smoke check passing.
+- Never modify CI/CD config, IaC, or production data unless the slice's `layers touched` explicitly names that layer.
+- Never invent acceptance criteria. If they're vague, label `needs-decision`.
+- Never assign issues or change milestones.
+
+## Final output line
+
+The shell loop greps for this exact line to determine outcome. Print it as the LAST line before exiting, on its own line, no decoration:
+
+```
+ITERATION_RESULT: status=<shipped|failed|needs-decision> issue=#<N> pr=<#N|none> reason=<short single-line reason>
+```
+
+Examples:
+```
+ITERATION_RESULT: status=shipped issue=#142 pr=#287 reason=merged-to-main
+ITERATION_RESULT: status=shipped issue=#143 pr=#288 reason=open-for-review
+ITERATION_RESULT: status=failed issue=#144 pr=#289 reason=ci-red-after-retry
+ITERATION_RESULT: status=needs-decision issue=#145 pr=none reason=acceptance-criteria-not-testable
+```
--- a/.claude/skills/ship-it/loop.sh
+++ b/.claude/skills/ship-it/loop.sh
@@ -0,0 +1,189 @@
+#!/usr/bin/env bash
+# ship-it AFK loop — works through every ready issue end-to-end.
+# See SKILL.md for design. Ctrl-C to stop; partial work is preserved on disk.
+
+set -uo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(git rev-parse --show-toplevel 2>/dev/null)" || { echo "not in a git repo"; exit 1; }
+
+# ---- config (env-overridable) ----
+MAX_ITERATIONS="${SHIP_IT_MAX:-50}"
+MAX_CONSECUTIVE_FAILURES="${SHIP_IT_MAX_FAIL:-3}"
+TRUNK_BRANCH="${SHIP_IT_TRUNK:-main}"
+ITERATION_TIMEOUT="${SHIP_IT_TIMEOUT:-30m}"   # per-issue cap
+LOG_DIR="${SHIP_IT_LOG_DIR:-$REPO_ROOT/.ship-it-logs}"
+
+mkdir -p "$LOG_DIR"
+RUN_ID="$(date -u +%Y%m%dT%H%M%SZ)"
+LOG_FILE="$LOG_DIR/run-$RUN_ID.log"
+
+# ---- logging ----
+log() {
+  local ts; ts="$(date -u +%H:%M:%S)"
+  printf '[%s] %s\n' "$ts" "$*" | tee -a "$LOG_FILE"
+}
+die() { log "FATAL: $*"; exit 1; }
+
+# ---- graceful interrupt ----
+INTERRUPTED=0
+on_interrupt() {
+  INTERRUPTED=1
+  log ""
+  log "interrupt received — finishing current step cleanly, then stopping"
+}
+trap on_interrupt INT
+
+# ---- tracker detection ----
+ORIGIN_URL="$(git -C "$REPO_ROOT" remote get-url origin 2>/dev/null || true)"
+if [[ "$ORIGIN_URL" == *"github.com"* ]]; then
+  TRACKER_CLI="gh"
+  command -v gh >/dev/null || die "gh CLI not installed"
+  gh auth status >/dev/null 2>&1 || die "gh not authenticated (run: gh auth login)"
+  list_ready_issues() {
+    gh issue list --state open --label slice --limit 100 \
+      --json number,title,labels \
+      --jq '[.[] | select(.labels | map(.name) | (contains(["blocked"]) or contains(["needs-decision"]) or contains(["ci-failed"])) | not)] | sort_by(.number)'
+  }
+elif [[ "$ORIGIN_URL" == *"gitea"* ]]; then
+  TRACKER_CLI="tea"
+  command -v tea >/dev/null || die "tea CLI not installed (Gitea repo detected) — install tea or switch to a GitHub remote"
+  list_ready_issues() {
+    tea issues list --state open --output json 2>/dev/null \
+      | jq '[.[] | select((.labels // []) | map(.name) | (contains(["blocked"]) or contains(["needs-decision"]) or contains(["ci-failed"])) | not) | select((.labels // []) | map(.name) | contains(["slice"]))] | sort_by(.index)'
+  }
+else
+  die "unknown tracker for origin: '$ORIGIN_URL' (need github.com or gitea.*)"
+fi
+
+# ---- preflight ----
+cd "$REPO_ROOT"
+[[ -z "$(git status --porcelain)" ]] || die "git tree is dirty — commit or stash before starting"
+CURRENT_BRANCH="$(git branch --show-current)"
+[[ "$CURRENT_BRANCH" == "$TRUNK_BRANCH" ]] || die "not on $TRUNK_BRANCH (on '$CURRENT_BRANCH')"
+git fetch origin "$TRUNK_BRANCH" >/dev/null 2>&1 || die "git fetch failed"
+LOCAL_SHA="$(git rev-parse HEAD)"
+REMOTE_SHA="$(git rev-parse "origin/$TRUNK_BRANCH")"
+[[ "$LOCAL_SHA" == "$REMOTE_SHA" ]] || die "$TRUNK_BRANCH not up-to-date with origin (pull first)"
+command -v claude >/dev/null || die "claude CLI not on PATH"
+
+# ---- banner ----
+log "ship-it run $RUN_ID"
+log "  tracker:  $TRACKER_CLI ($ORIGIN_URL)"
+log "  trunk:    $TRUNK_BRANCH @ ${LOCAL_SHA:0:8}"
+log "  log:      $LOG_FILE"
+log "  config:   max_iter=$MAX_ITERATIONS, max_fail=$MAX_CONSECUTIVE_FAILURES, timeout=$ITERATION_TIMEOUT"
+log ""
+
+ITERATE_PROMPT_TEMPLATE="$(cat "$SCRIPT_DIR/iterate.md")"
+SHIPPED=()
+FAILED=()
+NEEDS_DECISION=()
+CONSECUTIVE_FAILURES=0
+ITERATION=0
+
+# ---- main loop ----
+while (( ITERATION < MAX_ITERATIONS )); do
+  (( INTERRUPTED )) && break
+  ITERATION=$((ITERATION + 1))
+
+  READY_JSON="$(list_ready_issues 2>/dev/null || echo '[]')"
+  READY_COUNT="$(echo "$READY_JSON" | jq 'length' 2>/dev/null || echo 0)"
+
+  if (( READY_COUNT == 0 )); then
+    log "backlog empty — stopping"
+    break
+  fi
+
+  ISSUE_NUM="$(echo "$READY_JSON" | jq -r '.[0].number // .[0].index')"
+  ISSUE_TITLE="$(echo "$READY_JSON" | jq -r '.[0].title')"
+
+  log "─────────────────────────────────────────────────────────────"
+  log "iter $ITERATION | #$ISSUE_NUM \"$ISSUE_TITLE\" ($READY_COUNT ready) → starting"
+
+  ITER_LOG="$LOG_DIR/iter-$RUN_ID-$ISSUE_NUM.log"
+  PROMPT="$ITERATE_PROMPT_TEMPLATE
+
+## Variables for this iteration
+- ISSUE_NUMBER=$ISSUE_NUM
+- TRACKER_CLI=$TRACKER_CLI
+- TRUNK_BRANCH=$TRUNK_BRANCH
+- REPO_ROOT=$REPO_ROOT
+
+Begin."
+
+  ITER_START="$(date +%s)"
+  set +e
+  timeout "$ITERATION_TIMEOUT" claude -p "$PROMPT" \
+    --allowed-tools "Bash,Edit,Write,Read,Grep,Glob,WebFetch" \
+    --output-format text \
+    >"$ITER_LOG" 2>&1
+  CLAUDE_EXIT=$?
+  set -e
+  ITER_END="$(date +%s)"
+  ITER_DURATION=$((ITER_END - ITER_START))
+
+  RESULT_LINE="$(grep -E '^ITERATION_RESULT:' "$ITER_LOG" | tail -1 || true)"
+  STATUS="$(echo "$RESULT_LINE" | sed -n 's/.*status=\([^ ]*\).*/\1/p')"
+  PR_FIELD="$(echo "$RESULT_LINE" | sed -n 's/.*pr=\([^ ]*\).*/\1/p')"
+  REASON="$(echo "$RESULT_LINE" | sed -n 's/.*reason=\(.*\)/\1/p')"
+
+  if (( CLAUDE_EXIT == 124 )); then
+    STATUS="failed"
+    REASON="timeout after $ITERATION_TIMEOUT"
+  fi
+
+  case "$STATUS" in
+    shipped)
+      log "iter $ITERATION | #$ISSUE_NUM ✓ shipped → PR $PR_FIELD (${ITER_DURATION}s)"
+      SHIPPED+=("#$ISSUE_NUM→$PR_FIELD")
+      CONSECUTIVE_FAILURES=0
+      ;;
+    failed)
+      log "iter $ITERATION | #$ISSUE_NUM ✗ failed: $REASON (${ITER_DURATION}s, see $ITER_LOG)"
+      FAILED+=("#$ISSUE_NUM ($REASON)")
+      CONSECUTIVE_FAILURES=$((CONSECUTIVE_FAILURES + 1))
+      ;;
+    needs-decision)
+      log "iter $ITERATION | #$ISSUE_NUM ? needs-decision: $REASON (${ITER_DURATION}s)"
+      NEEDS_DECISION+=("#$ISSUE_NUM ($REASON)")
+      CONSECUTIVE_FAILURES=0
+      ;;
+    *)
+      log "iter $ITERATION | #$ISSUE_NUM ! unknown outcome (claude exit=$CLAUDE_EXIT, ${ITER_DURATION}s) — see $ITER_LOG"
+      FAILED+=("#$ISSUE_NUM (unknown outcome)")
+      CONSECUTIVE_FAILURES=$((CONSECUTIVE_FAILURES + 1))
+      ;;
+  esac
+
+  if (( CONSECUTIVE_FAILURES >= MAX_CONSECUTIVE_FAILURES )); then
+    log "$MAX_CONSECUTIVE_FAILURES consecutive failures — stopping for human review"
+    break
+  fi
+
+  # back to trunk for next iteration
+  if [[ "$(git branch --show-current)" != "$TRUNK_BRANCH" ]]; then
+    git switch "$TRUNK_BRANCH" >/dev/null 2>&1 || log "  warn: could not return to $TRUNK_BRANCH"
+  fi
+  git pull --ff-only origin "$TRUNK_BRANCH" >/dev/null 2>&1 || log "  warn: could not fast-forward $TRUNK_BRANCH"
+done
+
+# ---- summary ----
+log ""
+log "==== ship-it summary ===="
+log "iterations:     $ITERATION"
+log "shipped:        ${#SHIPPED[@]} ${SHIPPED[*]:-}"
+log "failed:         ${#FAILED[@]} ${FAILED[*]:-}"
+log "needs-decision: ${#NEEDS_DECISION[@]} ${NEEDS_DECISION[*]:-}"
+log "log:            $LOG_FILE"
+
+if (( INTERRUPTED )); then
+  log "stop reason:    user-interrupt"
+  exit 130
+elif (( CONSECUTIVE_FAILURES >= MAX_CONSECUTIVE_FAILURES )); then
+  log "stop reason:    consecutive-failures"
+  exit 1
+else
+  log "stop reason:    backlog-empty"
+  exit 0
+fi