You have an AI coding tool. It's great at small tasks — fix this bug, add this function. But when you ask it to build a full feature across multiple files, it falls apart. Context gets stale, the agent forgets what it did three steps ago, and you end up babysitting it anyway.
Craftloop fixes that. You describe what you want to build, and it handles the rest — planning, breaking work into small pieces, implementing each piece, testing, reviewing, and committing. You go do something else. Come back to a branch with working code.
- You describe a feature. "Add a fighters page with filtering and fight history."
- Craft asks you real questions. Not generic ones — it reads your codebase first, then asks about your actual code. "Your ThemeProvider exposes useTheme() and getTheme(). Which should the toggle use?"
- It writes a plan. A design doc, then user stories with concrete acceptance criteria. Each story is small enough for one agent pass.
- It runs the work autonomously. Each story gets a fresh agent with clean context — no accumulated confusion. Implement, run tests, code review, commit. Pick up next story. Repeat.
- Quality gates catch mistakes early. Your linter, type checker, and tests run after every implementation. Failures get bounced back with error output — no wasted AI call for obvious breaks.
The whole thing runs in a loop until every story passes. You can watch it work with craft status, or just check back later.
You ── "add dark mode" ──▶ /craft ── questions, pick approach ──▶ brief.md
│
┌──────────────────────────────────────────┘
▼
┌─────────────┐
│ PLAN │ brief.md + RAG context
│ │ ──▶ design.md
└──────┬──────┘
▼
┌─────────────┐
│ CONVERT │ design.md ──▶ prd.json
│ │ batched extraction, auto-split
│ │ if stories too big or vague
└──────┬──────┘
▼
┌─────────────┐
│ CLARIFY │ validate criteria count,
│ │ flag vague language
└──────┬──────┘
▼
┌───────────────────────────────────────────────────┐
│ EXECUTION LOOP │
│ │
│ for each story in prd.json: │
│ │
│ ┌─ IMPLEMENT ──────────────────────────────┐ │
│ │ Build task file (story + RAG context + │ │
│ │ previous failures). Spawn fresh agent. │ │
│ │ Run quality gates after. Fail → reject. │ │
│ └──────────────────────────┬────────────────┘ │
│ ▼ │
│ ┌─ REVIEW ─────────────────────────────────┐ │
│ │ Run quality gates first. │ │
│ │ • pass + checkpoint → auto-pass │ │
│ │ • fail → reject, no agent needed │ │
│ │ • pass → agent checks acceptance criteria│ │
│ └──────────────────────────┬────────────────┘ │
│ ▼ │
│ rejected? ──▶ back to IMPLEMENT │
│ passed? ──▶ next story │
│ │
│ stale detection: same story 2+ iterations → abort │
│ zombie detection: no output for 5 min → kill │
│ │
└────────────────────────────────────────────────────┘
The execution layer is tool-agnostic. Same loop, same protocol, different AI tool underneath.
~/.craft/tools/
├── claude/ # Claude Code
├── codex/ # OpenAI Codex
├── qwen/ # Qwen Code
└── copilot/ # GitHub Copilot
Each tool is a directory with a config file and a runner script. Adding support for a new AI tool is just adding another directory. Skills follow the Agent Skills open standard so they work across tools.
I wanted to ship full features with AI — not just one file at a time. The tools we have are powerful, but they're designed for single tasks. Nobody was solving the orchestration problem the way I needed it solved:
The context problem. AI tools lose track of what they're doing on big features. Long conversations accumulate stale context, costs explode, and the model starts hallucinating about code it wrote five messages ago. I needed fresh context per task — give the agent exactly what it needs, nothing it doesn't.
The verification problem. When an agent runs for an hour and hands you a massive diff, how do you know it's right? I needed quality gates after every small step, not after a marathon session. Catch problems in minutes, not hours.
The babysitting problem. Tools like GitHub's Spec Kit do a great job structuring specs and breaking work into tasks. But you still run each task manually — trigger, wait, check, trigger the next one. I needed full autonomy: start the loop, walk away, come back to working code.
The tool lock-in problem. I use different tools for different things. Sometimes Claude, sometimes Codex, sometimes Qwen. I needed an orchestrator that works with any tool — same loop, same protocol, swap the brain underneath.
Craftloop is my answer to all of these. It's not trying to replace any AI coding tool — it sits on top of whichever one you already use and handles the parts they don't: decomposition, sequencing, verification, and recovery.
# Install (Linux/macOS, no sudo needed)
curl -fsSL https://raw.githubusercontent.com/craftogrammer/craftloop/main/install.sh | bash
# Set up your project
cd your-project
craft initThis creates a .craft/ directory and installs the /craft skill into your AI tool. Now open your tool's chat and type:
/craft create a multiplayer snake game
Craft brainstorms with you inside the chat — asks a couple of questions, proposes approaches, you pick one. Then it generates a design doc, breaks it into user stories, and starts the autonomous loop. You walk away. Come back to a branch with working code.
You can also drive each step manually from your terminal:
craft plan --tool claude # writes a design doc
craft convert --tool claude # turns it into user stories
craft clarify # validates story quality
craft run --tool claude 50 # runs 50 iterations
craft status # check how it's going- Quality gates before AI review. Tests and linting run after every implementation. If they fail, the story gets bounced with error output — no review agent needed.
- Cheaper model for reviews. Reviews use a smaller model by default while implementation uses a bigger one. Or route reviews to a different tool entirely with
--review-tool. - Skip trivial reviews. Config changes and type definitions get auto-passed at the next checkpoint.
- Fresh context per story. No bloat from accumulated conversation. Each agent starts clean with just what it needs.
- Smart code search. Finds the relevant code for each story automatically. Agents spend time building, not exploring.
Claude Code, GitHub Copilot, Codex, Qwen Code, Amp. Each tool is a plugin — a config file and a runner script. Adding a new one is just another directory under tools/.
| Command | What it does |
|---|---|
craft init |
Set up .craft/ directory and install skills |
craft init --force |
Update skills to latest (never touches your prd.json or progress) |
craft plan |
Generate design doc from brief |
craft convert |
Convert design doc to user stories |
craft clarify |
Validate story quality before running the loop |
craft run --tool claude 50 |
Run the loop (50 iterations max) |
craft status |
See what's happening |
craft archive |
Archive current work, start fresh |
| Flag | Default | |
|---|---|---|
--tool <name> |
claude |
Which AI tool to use |
--review-tool <name> |
same | Different tool for reviews |
--zombie-timeout <sec> |
300 |
Seconds of silence before killing agent |
| Variable | Default |
|---|---|
CRAFT_CLAUDE_MODEL |
claude-sonnet-4-6 |
CRAFT_CLAUDE_REVIEW_MODEL |
claude-haiku-4-5 |
CRAFT_CODEX_MODEL |
— |
CRAFT_QWEN_MODEL |
— |
A feature moves through states:
brainstorming → planned → converting → ready → clarified → running → completed → archived
Each phase writes files that the next phase reads. If the pipeline crashes, it detects what was already produced and resumes from there — no wasted re-work.
craft status tells you where you are, including what sub-step is running. craft archive cleans up and starts fresh. When you run /craft again, it reads what was done before — previous patterns, learnings, what worked — and carries that forward.
Started as a fork of snarktank/craft — the original autonomous agent loop. Evolved quite a bit from there.
The brainstorm-plan-execute pipeline and fresh-context-per-task pattern came from studying Superpowers by Anthropic. Good ideas worth borrowing.
Skills use the Agent Skills open standard. Thanks to skills.sh for the ecosystem.
Built on top of Claude Code, Codex, Qwen Code, Amp, and GitHub Copilot.
MIT