"Give an AI agent a real project and let it experiment autonomously." β Inspired by karpathy/autoresearch
Original karpathy/autoresearch β AI agent researches neural network training (nanochat), modifying only train.py with a single val_bpb metric.
ProjectEvolve β extends this idea to any project:
- Any programming language (Python, JavaScript, Go, Rust, ...)
- Any task types (backend, frontend, DevOps, documentation, ...)
- Any files and directories (full freedom of action)
- Cross-platform (Windows, Linux, macOS)
- Knowledge persistence across runs
Key inheritance: agent works autonomously, iteratively improves project, keeps successful changes, discards failures.
ProjectEvolve is a universal tool for running an AI agent on any project. The agent autonomously analyzes code, proposes improvements, makes changes, and learns from previous experiments.
- Analyzes β studies project structure, code, documentation
- Proposes β generates improvement ideas
- Implements β makes changes to code/structure/docs
- Tests β ensures nothing breaks
- Accumulates β next iteration sees previous results
- Repeats β cycle continues autonomously
- π Autonomous experiments β AI independently analyzes, proposes, and implements improvements
- π Knowledge accumulation β each iteration sees previous results, building project knowledge
- β‘ Universality β works with Python, JavaScript, Go, Rust, and any other technology
- π¨ Flexible setup β simple questionnaire adapts to project
- π Cross-platform β Windows, Linux, macOS
- π§ Zero maintenance β agent handles everything
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Your Project βββββββΆβ ProjectEvolveβββββββΆβ AI Agent β
β (any language) β β (script) β β (Claude) β
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββ βββββββββββββββ
β Configurationβ β Experiment β
β .autoresearchβ β #1, #2, #3 β
ββββββββββββββββ βββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββ βββββββββββββββ
β Improvementsββββββββ Context β
β code/docs β β accumulatesβ
ββββββββββββββββ βββββββββββββββ
- π Analyze β studies project structure, code, documentation
- π‘ Propose β generates improvement ideas
- π¨ Implement β makes changes to code, structure, documentation
- π§ͺ Quality Loop β built-in self-testing with quantitative metrics
- π Evaluate β automatic scoring (0.0-1.0) with pass/fail decisions
- π Document β updates README, creates new documentation
- π Iterate β each iteration learns from previous ones
| Platform | Support | Installation |
|---|---|---|
| Windows | β Full | autoresearch.bat |
| Linux | β Full | python autoresearch.py |
| macOS | β Full | python autoresearch.py |
- Python 3.10+
- Claude CLI (Anthropic)
- Git (optional)
ProjectEvolve includes a built-in self-testing system inspired by quality gates:
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Generate βββββββΆβ Apply βββββββΆβ Evaluate β
β Idea β β Changes β β (Score) β
ββββββββββββββββ ββββββββββββββββ ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β Decision β
β KEEP/DISCARDβ
ββββββββββββββββ
β
βββββββββββββββββββββββ
β (if kept)
βΌ
ββββββββββββββββ
β Next Iter. β
ββββββββββββββββ
- Universal β works with Python, JavaScript, Go, Rust, Ruby, Java, any language
- Auto-detect β automatically finds test commands (
npm test,pytest,cargo test, etc.) - Quantitative β scores 0.0-1.0 with pass/fail decisions
- Two-phase β Phase A (base quality, 70% threshold) β Phase B (strict quality, 85% threshold)
- Automatic β runs tests after each experiment, decides to keep or discard changes
# Standalone quality check
python F:/IdeaProjects/autoresearch/utils/quality_loop.py --project /path/to/project
# Custom thresholds
python utils/quality_loop.py --project . --threshold-a 0.7 --threshold-b 0.85
# JSON output for parsing
python utils/quality_loop.py --project . --jsonConfiguration file .autoresearch/quality.yml is created automatically:
metrics:
tests:
enabled: true
command: "" # Auto-detect: npm test, pytest, cargo test, etc.
build:
enabled: false
command: "" # Auto-detect: npm run build, cargo build, etc.
thresholds:
a:
min_score: 0.7 # Phase A threshold
required_checks: ["tests"]
b:
min_score: 0.85 # Phase B threshold
required_checks: ["tests", "build"]Keep changes if:
- β Score β₯ baseline + 0.05 (improvement)
- β All required checks pass
- β No critical failures
Discard changes if:
- β Score decreased
- β Critical tests fail
- β Violates project constraints
Manual review if:
β οΈ Score ~ baseline (minimal change)β οΈ Some non-critical tests fail
ProjectEvolve requires Claude Code to run with appropriate permissions.
- β "bypass permissions on" β Recommended! No approvals needed, full autonomy
β οΈ Other modes (auto/manual) β May require permission approvals during execution- β Risk: Agent may hang waiting for user to approve tool usage
ProjectEvolve agent needs these tools to be approved (if not in bypass mode):
Core tools:
Editβ Modify filesReadβ Read file contentsWriteβ Create new filesGlobβ Find files by patternGrepβ Search file contents
Optional tools:
Bashβ Execute shell commands (for Quality Loop)Agentβ Spawn sub-agents
Option 1: Bypass Mode (Recommended)
{
"permissionMode": "bypass"
}Option 2: Auto-Approve Safe Tools
{
"permissionMode": "auto",
"autoApproveSafeTools": true,
"alwaysAllowTools": ["Edit", "Read", "Write", "Glob", "Grep"]
}If agent hangs during experiment execution:
- Check if a permission prompt is waiting for approval
- Approve the required tool (Edit, Read, Write, etc.)
- Or switch to bypass mode for full autonomy
Note: All scripts have a 30-minute timeout per experiment. If Claude CLI hangs (e.g., on permission prompts), the experiment will time out and continue to the next iteration. Check logs for timeout errors.
For AI agent: Read INSTALL.md and configure the system:
# AI agent sets up the environment
# (install Python, Node.js, Claude CLI, create directories)AI agent will:
- β Detect OS (Windows/Linux/macOS)
- β Install missing dependencies
- β Create required directories
- β Verify installation
See:
INSTALL.mdβ this is the instruction for AI agent on cross-platform setup.
After environment setup, run the script:
# Basic run (10 iterations, 5 min interval)
python F:/IdeaProjects/autoresearch/autoresearch.py --project /path/to/project
# With parameters
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --iter 50 --timeout 2
# Windows (via bat-file)
F:/IdeaProjects/autoresearch/autoresearch.py . 50 2Parameters:
--projectβ path to your project--iterβ number of iterations (default: 10)--timeoutβ interval between iterations in minutes (default: 5)
autoresearch/
βββ autoresearch.py # Main script
βββ autoresearch.bat # Windows launcher
βββ INSTALL.md # Installation guide (for AI)
βββ README.md # This file (English main)
βββ README_RU.md # Russian version (full)
βββ QUICKSTART.md # Quick guide
βββ config/
β βββ default_prompt.md # Agent prompt template
β βββ quality.yml # Quality gate configuration
βββ utils/
β βββ cli_setup.py # Interactive setup
β βββ quality_loop.py # Quality loop implementation
βββ .gitignore # Git ignore
your-project/
βββ .autoresearch/
β βββ .autoresearch.json # Project configuration
β βββ quality.yml # Quality gate configuration (auto-created)
β βββ experiments/
β β βββ prompt_1.md
β β βββ output_1.md
β β βββ accumulation_context.md # Accumulated context
β β βββ last_experiment.md # Last experiment
β β βββ changes_log.md # Changes log
β β βββ summary.json # Final summary
β βββ logs/
β βββ autoresearch.log # Run logs
========================================================================
ProjectEvolve - First Time Setup
========================================================================
Project: /path/to/your-project
Project name: My Awesome App
Short description: Web app for task management
Project goals (one per line):
> Improve performance
> Add tests
> Update documentation
> [Enter]
Constraints (optional):
> Don't change API
> [Enter]
β Configuration saved!
{
"name": "My Awesome App",
"description": "Web app for task management",
"goals": [
"Improve performance",
"Add tests",
"Update documentation"
],
"constraints": [
"Don't change API"
],
"tech_stack": ["Python", "FastAPI", "PostgreSQL"],
"focus_areas": ["performance", "testing", "documentation"]
}# Short form: 3 experiments, 1 minute interval
python F:/IdeaProjects/autoresearch/autoresearch.py . 3 1
# Long form: same as above
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --iter 3 --timeout 1# 50 experiments, 10 minutes interval
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --iter 50 --timeout 10# Initial configuration
python F:/IdeaProjects/autoresearch/autoresearch.py --project /path/to/project --configure
# Later β run
python F:/IdeaProjects/autoresearch/autoresearch.py --project /path/to/project --iter 10# Continue from Experiment 25 (after previous session ended at 24)
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --iter 10 --start-from 25
# This will run Experiments 25-34 (10 experiments starting from 25)
# The agent will still see accumulated context from all previous experiments# Auto-detects next experiment number (if output_1.md exists, starts from 2)
python F:/IdeaResearch/autoresearch/autoresearch.py . 10 1
# Or without --project parameter (uses current directory)
python F:/IdeaProjects/autoresearch/autoresearch.py 10 1npm install -g @anthropic-ai/claude-codeInstall Python 3.10+ and add to PATH.
Increase interval between iterations (--timeout).
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --reconfigure- π INSTALL.md β Installation guide (for AI agent)
- β‘ QUICKSTART.md β Quick guide
- π·πΊ README_RU.md β Π ΡΡΡΠΊΠ°Ρ Π²Π΅ΡΡΠΈΡ
Contributions welcome! Create issues and pull requests.
- π Web UI for experiment monitoring
- π Progress visualization
- π Completion notifications
- π Metrics and analytics
- π CI/CD integration
MIT License β freely use in any project.
If you find this project useful, please give it a star on GitHub!
Made with β€οΈ for autonomous project research