Skip to content

craftogrammer/craftloop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Craftloop

You have an AI coding tool. It's great at small tasks — fix this bug, add this function. But when you ask it to build a full feature across multiple files, it falls apart. Context gets stale, the agent forgets what it did three steps ago, and you end up babysitting it anyway.

Craftloop fixes that. You describe what you want to build, and it handles the rest — planning, breaking work into small pieces, implementing each piece, testing, reviewing, and committing. You go do something else. Come back to a branch with working code.

What it actually does

  1. You describe a feature. "Add a fighters page with filtering and fight history."
  2. Craft asks you real questions. Not generic ones — it reads your codebase first, then asks about your actual code. "Your ThemeProvider exposes useTheme() and getTheme(). Which should the toggle use?"
  3. It writes a plan. A design doc, then user stories with concrete acceptance criteria. Each story is small enough for one agent pass.
  4. It runs the work autonomously. Each story gets a fresh agent with clean context — no accumulated confusion. Implement, run tests, code review, commit. Pick up next story. Repeat.
  5. Quality gates catch mistakes early. Your linter, type checker, and tests run after every implementation. Failures get bounced back with error output — no wasted AI call for obvious breaks.

The whole thing runs in a loop until every story passes. You can watch it work with craft status, or just check back later.

Architecture

You ── "add dark mode" ──▶ /craft ── questions, pick approach ──▶ brief.md
                                                                     │
                          ┌──────────────────────────────────────────┘
                          ▼
                   ┌─────────────┐
                   │    PLAN     │  brief.md + RAG context
                   │             │  ──▶ design.md
                   └──────┬──────┘
                          ▼
                   ┌─────────────┐
                   │   CONVERT   │  design.md ──▶ prd.json
                   │             │  batched extraction, auto-split
                   │             │  if stories too big or vague
                   └──────┬──────┘
                          ▼
                   ┌─────────────┐
                   │   CLARIFY   │  validate criteria count,
                   │             │  flag vague language
                   └──────┬──────┘
                          ▼
  ┌───────────────────────────────────────────────────┐
  │                  EXECUTION LOOP                    │
  │                                                    │
  │  for each story in prd.json:                       │
  │                                                    │
  │    ┌─ IMPLEMENT ──────────────────────────────┐    │
  │    │  Build task file (story + RAG context +   │    │
  │    │  previous failures). Spawn fresh agent.   │    │
  │    │  Run quality gates after. Fail → reject.  │    │
  │    └──────────────────────────┬────────────────┘    │
  │                               ▼                     │
  │    ┌─ REVIEW ─────────────────────────────────┐    │
  │    │  Run quality gates first.                 │    │
  │    │  • pass + checkpoint → auto-pass          │    │
  │    │  • fail → reject, no agent needed         │    │
  │    │  • pass → agent checks acceptance criteria│    │
  │    └──────────────────────────┬────────────────┘    │
  │                               ▼                     │
  │    rejected? ──▶ back to IMPLEMENT                  │
  │    passed?   ──▶ next story                         │
  │                                                    │
  │  stale detection: same story 2+ iterations → abort │
  │  zombie detection: no output for 5 min → kill      │
  │                                                    │
  └────────────────────────────────────────────────────┘

Tool plugin system

The execution layer is tool-agnostic. Same loop, same protocol, different AI tool underneath.

~/.craft/tools/
├── claude/          # Claude Code
├── codex/           # OpenAI Codex
├── qwen/            # Qwen Code
└── copilot/         # GitHub Copilot

Each tool is a directory with a config file and a runner script. Adding support for a new AI tool is just adding another directory. Skills follow the Agent Skills open standard so they work across tools.

Why I built this

I wanted to ship full features with AI — not just one file at a time. The tools we have are powerful, but they're designed for single tasks. Nobody was solving the orchestration problem the way I needed it solved:

The context problem. AI tools lose track of what they're doing on big features. Long conversations accumulate stale context, costs explode, and the model starts hallucinating about code it wrote five messages ago. I needed fresh context per task — give the agent exactly what it needs, nothing it doesn't.

The verification problem. When an agent runs for an hour and hands you a massive diff, how do you know it's right? I needed quality gates after every small step, not after a marathon session. Catch problems in minutes, not hours.

The babysitting problem. Tools like GitHub's Spec Kit do a great job structuring specs and breaking work into tasks. But you still run each task manually — trigger, wait, check, trigger the next one. I needed full autonomy: start the loop, walk away, come back to working code.

The tool lock-in problem. I use different tools for different things. Sometimes Claude, sometimes Codex, sometimes Qwen. I needed an orchestrator that works with any tool — same loop, same protocol, swap the brain underneath.

Craftloop is my answer to all of these. It's not trying to replace any AI coding tool — it sits on top of whichever one you already use and handles the parts they don't: decomposition, sequencing, verification, and recovery.

Quick start

# Install (Linux/macOS, no sudo needed)
curl -fsSL https://raw.githubusercontent.com/craftogrammer/craftloop/main/install.sh | bash

# Set up your project
cd your-project
craft init

This creates a .craft/ directory and installs the /craft skill into your AI tool. Now open your tool's chat and type:

/craft create a multiplayer snake game

Craft brainstorms with you inside the chat — asks a couple of questions, proposes approaches, you pick one. Then it generates a design doc, breaks it into user stories, and starts the autonomous loop. You walk away. Come back to a branch with working code.

You can also drive each step manually from your terminal:

craft plan --tool claude          # writes a design doc
craft convert --tool claude       # turns it into user stories
craft clarify                     # validates story quality
craft run --tool claude 50        # runs 50 iterations
craft status                      # check how it's going

How it keeps costs down

  • Quality gates before AI review. Tests and linting run after every implementation. If they fail, the story gets bounced with error output — no review agent needed.
  • Cheaper model for reviews. Reviews use a smaller model by default while implementation uses a bigger one. Or route reviews to a different tool entirely with --review-tool.
  • Skip trivial reviews. Config changes and type definitions get auto-passed at the next checkpoint.
  • Fresh context per story. No bloat from accumulated conversation. Each agent starts clean with just what it needs.
  • Smart code search. Finds the relevant code for each story automatically. Agents spend time building, not exploring.

Supported tools

Claude Code, GitHub Copilot, Codex, Qwen Code, Amp. Each tool is a plugin — a config file and a runner script. Adding a new one is just another directory under tools/.

Commands

Command What it does
craft init Set up .craft/ directory and install skills
craft init --force Update skills to latest (never touches your prd.json or progress)
craft plan Generate design doc from brief
craft convert Convert design doc to user stories
craft clarify Validate story quality before running the loop
craft run --tool claude 50 Run the loop (50 iterations max)
craft status See what's happening
craft archive Archive current work, start fresh

Run flags

Flag Default
--tool <name> claude Which AI tool to use
--review-tool <name> same Different tool for reviews
--zombie-timeout <sec> 300 Seconds of silence before killing agent

Environment variables

Variable Default
CRAFT_CLAUDE_MODEL claude-sonnet-4-6
CRAFT_CLAUDE_REVIEW_MODEL claude-haiku-4-5
CRAFT_CODEX_MODEL
CRAFT_QWEN_MODEL

Lifecycle

A feature moves through states:

brainstorming → planned → converting → ready → clarified → running → completed → archived

Each phase writes files that the next phase reads. If the pipeline crashes, it detects what was already produced and resumes from there — no wasted re-work.

craft status tells you where you are, including what sub-step is running. craft archive cleans up and starts fresh. When you run /craft again, it reads what was done before — previous patterns, learnings, what worked — and carries that forward.

Credits

Started as a fork of snarktank/craft — the original autonomous agent loop. Evolved quite a bit from there.

The brainstorm-plan-execute pipeline and fresh-context-per-task pattern came from studying Superpowers by Anthropic. Good ideas worth borrowing.

Skills use the Agent Skills open standard. Thanks to skills.sh for the ecosystem.

Built on top of Claude Code, Codex, Qwen Code, Amp, and GitHub Copilot.

License

MIT