RnD/EVOLV

Files

znetsixe 6ff262e96e chore(skills): add workflow chain — grill-me → prd → prd-to-issues → ship-it

Four workflow skills that take a feature from fuzzy idea to merged code.
Two human-in-the-loop phases (grill-me, prd), one mostly-together (prd-to-issues
files only on explicit 'create'), and one AFK (ship-it).

  grill-me        TOGETHER  pressure-test the idea with hard interview questions
  prd             TOGETHER  synthesize PRD; gaps stay explicit, not papered over
  prd-to-issues   MOSTLY    thin vertical-slice issues with coverage matrix +
                            per-issue Slice check; self-audits before showing
  ship-it         AFK       shell loop ships each slice end-to-end with one
                            commit per issue, status streams to terminal,
                            Ctrl-C-able, survives session close

Vertical-slice principle throughout: every issue cuts end-to-end through every
integration layer (no horizontal "do all the DB work first" issues). The
AFK loop only ships against acceptance criteria already locked in by the PRD
phase — autonomous code never runs against undefined contracts.

ship-it tracker support: gh (GitHub) and tea (Gitea). For this repo, set
SHIP_IT_TRUNK=development to override the main default.

See .claude/skills/README.md for the full how-to and a worked example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-21 16:27:15 +02:00

7.1 KiB

Raw Blame History

name, description

name	description
ship-it	AFK autopilot. Drives a shell loop that works through every ready issue in the tracker (GitHub via gh, Gitea via tea), implementing each vertical slice end-to-end and committing per issue. Status streams to the terminal so the human can tail progress locally and Ctrl-C anytime. The shell is the loop; each iteration dispatches one fresh headless Claude run to ship one issue. Use when the user invokes /ship-it, says "go AFK on this", "work the backlog", "ralph the issues", or "ship everything".

name

description

ship-it

AFK autopilot. Drives a shell loop that works through every ready issue in the tracker (GitHub via gh, Gitea via tea), implementing each vertical slice end-to-end and committing per issue. Status streams to the terminal so the human can tail progress locally and Ctrl-C anytime. The shell is the loop; each iteration dispatches one fresh headless Claude run to ship one issue. Use when the user invokes /ship-it, says "go AFK on this", "work the backlog", "ralph the issues", or "ship everything".

Ship It — AFK backlog autopilot

Mode: AFK. No human in the loop. Does not ask questions mid-run. If a slice is undecidable, the iteration labels the issue needs-decision and the loop moves on. The human gets one summary at the end, not chatter during.

How this works (read before invoking)

The actual loop runs in a shell script: .claude/skills/ship-it/loop.sh. The shell is the loop, not you. Each iteration shells out to a fresh, headless claude -p invocation that processes exactly one issue using .claude/skills/ship-it/iterate.md as its prompt. Three reasons this design beats "LLM keeps going inside one session":

Fresh context per issue. No drift, no accumulated history bloating the window.
Visible in the terminal. Progress streams to stdout and tees to a log file. The human can tail it from another shell, see commits land, and Ctrl-C cleanly.
Survives session close. Closing the interactive Claude window doesn't kill the loop. Re-attach by tailing the log.

Files

loop.sh — orchestrator. Tracker detection, preflight, dispatch loop, status output, stop conditions, summary.
iterate.md — the prompt passed to each per-issue headless Claude. Read it; it defines what "shipped" means.
SKILL.md — this file. When the user invokes /ship-it, you bootstrap and hand off.

When the user invokes /ship-it

You (the interactive Claude) do the bootstrap, not the work. Concretely:

Preflight in chat (catches the obvious failures before the script runs):
- git status --porcelain empty?
- On main (or $SHIP_IT_TRUNK)? Up-to-date with origin?
- gh auth status (or tea token) returns 0?
- gh issue list --state open --label slice | wc -l ≥ 1?
Show the plan in one short block: tracker host, trunk branch, count of ready issues, the first 3 issue titles, the log path. Nothing more.
Ask one question: "Start? Reply go." This is the only human-in-the-loop checkpoint — kicking off AFK work is a real commitment, deserves an explicit ok.
On go: run the loop in the foreground so the user sees live output:
```
bash .claude/skills/ship-it/loop.sh
```
Do not background it. Do not pipe through anything that buffers. The user can Ctrl-C.
While it runs: stay silent. Don't interject. Don't "monitor" by re-reading logs in chat — the user has the terminal.
When it exits: read the final ==== ship-it summary ==== block from the log file, present it once with concrete next steps ("2 issues are needs-decision — open them to answer their questions?").

Following progress

The script logs to stdout AND tees to .ship-it-logs/run-<RUN_ID>.log. Tail from another terminal:

tail -f .ship-it-logs/run-*.log

Per-issue detail (everything the headless Claude did for that one issue) is in .ship-it-logs/iter-<RUN_ID>-<ISSUE>.log — useful for debugging a failed iteration.

Commits land in git as the loop runs. Watch with:

watch -n 5 'git log --oneline -10 origin/main'

Config (env vars, override before invoking)

Var	Default	Purpose
`SHIP_IT_MAX`	50	Hard cap on iterations per run
`SHIP_IT_MAX_FAIL`	3	Consecutive failures before stop
`SHIP_IT_TRUNK`	`main`	Trunk branch name
`SHIP_IT_TIMEOUT`	`30m`	Per-issue timeout (kills the headless claude)
`SHIP_IT_LOG_DIR`	`<repo>/.ship-it-logs`	Where logs go

What each iteration does (per `iterate.md`)

For one issue: read it → branch from trunk → write failing e2e test at the outermost layer → implement layer by layer until the test passes → run the full suite → outermost-layer smoke check → commit (one commit, message ends Closes #N) → push → open PR with acceptance-criteria checkboxes + smoke evidence → wait for CI → merge if green and branch protection allows, else leave open for review → return to trunk → emit ITERATION_RESULT: line for the loop.

Commit per issue: yes, exactly. One commit per slice, referenced to the issue, lands on the branch before the PR opens. The slice scope was made small in /prd-to-issues precisely so this is one tight commit, not a series.

Stop conditions (in priority order)

User Ctrl-C → trap catches SIGINT, current step finishes cleanly, summary prints, exit 130.
Backlog empty (no ready issues) → exit 0.
Three consecutive hard failures → exit 1. Something systemic — bad dependency, branch protection blocking, flaky env. Surfaces for human review.
Precondition violated mid-run → exit non-zero with reason.

What "ready" means (the loop's filter)

An issue is ready iff:

State is open
Has label slice (filed by /prd-to-issues)
Does NOT have label blocked, needs-decision, or ci-failed
Is not a spike (spikes deliver decisions, not code — humans handle those)

Issues are processed in number order — walking-skeleton first, as /prd-to-issues ordered them.

Safety boundaries

The headless Claude is launched with a tool allowlist that excludes destructive operations. It cannot:

Force-push or rewrite shared history
Bypass branch protection or skip CI hooks (--no-verify, --admin)
Auto-merge red or pending PRs (the iterate prompt forbids it, and CI gates back it up)
Modify CI/CD config or IaC unless the slice's Slice — layers touched line explicitly names that layer
Close issues without the outermost-layer smoke check passing
Assign people or change milestones/projects

If something tries to push past these in practice (e.g. a slice "needs" a CI change to pass), it should fail the iteration with needs-decision and let a human approve the scope expansion.

What not to do

Don't drive the loop yourself by reading issues and implementing them inline. The shell is the loop. If you're tempted to "just do this one in chat," stop and run the script.
Don't background the script so the user can keep chatting with you. The output IS the value. The user wants to watch it work.
Don't summarize between iterations. Chatter belongs in the final summary, not after each commit.
Don't tag the user in PR/issue comments during the run. They're not in the loop until the script exits.
Don't restart a failed iteration manually. The loop's needs-decision and ci-failed labels are how failures stay in the tracker for human triage. Manual restart skips that.

How this fits the chain

/grill-me <feature> (together) → /prd (together) → /prd-to-issues (mostly together, file step needs create) → /ship-it (AFK). The four-skill arc takes a vague feature idea to merged code with one human checkpoint per phase boundary.

7.1 KiB Raw Blame History