feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0)#1005
Open
feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0)#1005
Conversation
Injects a high-stakes ambiguity gate at preamble tier >= 2 so all workflow skills get it. Fires when Claude encounters architectural decisions, data model changes, destructive operations, or contradictory requirements. Does NOT fire on routine coding. Addresses Karpathy failure mode #1 (wrong assumptions) with an inline STOP gate instead of relying on workflow skill invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hermes: tool rewrites for terminal/read_file/patch/delegate_task, paths to ~/.hermes/skills/gstack, AGENTS.md config file. GBrain: coding skills become brain-aware when GBrain mod is installed. Same tool rewrites as OpenClaw (agents spawn Claude Code via ACP). GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS NOT suppressed on gbrain host, enabling brain-first lookup and save-to-brain behavior. Both registered in hosts/index.ts with setup script redirect messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New scripts/resolvers/gbrain.ts with two resolver functions: - GBRAIN_CONTEXT_LOAD: search brain for context before skill starts - GBRAIN_SAVE_RESULTS: save skill output to brain after completion Placeholders added to 4 thinking skill templates (office-hours, investigate, plan-ceo-review, retro). Resolves to empty string on all hosts except gbrain via suppressedResolvers. GBRAIN suppression added to all 9 non-gbrain host configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Step 3.5 to the review template: runs bun run slop:diff against the base branch to catch AI code quality issues (empty catches, redundant return await, overcomplicated abstractions). Advisory only, never blocking. Skips silently if slop-scan is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Positions gstack as the workflow enforcement layer for Karpathy-style CLAUDE.md rules (17K stars). Links to forrestchang/andrej-karpathy-skills. Maps each Karpathy failure mode to the gstack skill that addresses it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
office-hours: add design doc path visibility message after writing ceo-review: add HARD GATE reminder at review section transitions retro: add non-git context support (check memory for meeting notes) Mirrors template improvements to hand-crafted native skills. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Host count: 8 → 10 (hermes, gbrain) - OpenClaw adapter test: expects undefined (dead code removed) - Golden ship fixtures: updated with Confusion Protocol + vendoring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Regenerated from templates after Confusion Protocol, GBrain resolver placeholders, slop:diff in review, HARD GATE reminders, investigation learnings, design doc visibility, and retro non-git context changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CHANGELOG: add v0.18.0.0 entry (Confusion Protocol, Hermes, GBrain, slop in review, Karpathy note, skill improvements) - CLAUDE.md: add hermes.ts and gbrain.ts to hosts listing - README.md: update agent count 8→10, add Hermes + GBrain to table - VERSION: bump to 0.18.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
E2E Evals: ✅ PASS45/45 tests passed | $6.12 total cost | 12 parallel runners
12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite |
The review-base-branch E2E test was copying the full 1493-line review/SKILL.md into the test fixture. The agent spent 8+ turns reading it in chunks, leaving only 7 turns for actual work, causing error_max_turns on every attempt. Now extracts only Step 0 (base branch detection, ~50 lines) which is all the test actually needs. Follows the CLAUDE.md rule: "NEVER copy a full SKILL.md file into an E2E test fixture." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Agent runtime support + Karpathy-inspired guardrails + skill improvements.
Confusion Protocol — inline ambiguity gate in the preamble. When Claude hits a
decision that could go two ways (which architecture? which data model? destructive
operation with unclear scope?), it stops and asks instead of guessing. Scoped to
high-stakes decisions only. Addresses Karpathy failure mode #1 (wrong assumptions).
Hermes + GBrain host configs — two new hosts. Hermes gets tool rewrites for
terminal/read_file/patch/delegate_task. GBrain is a "mod" for gstack:coding skills become brain-aware when installed, searching the brain for context
before starting and saving results after finishing.
GBrain resolver —
GBRAIN_CONTEXT_LOADandGBRAIN_SAVE_RESULTSinjected into4 thinking skill templates (office-hours, investigate, ceo-review, retro). Suppressed
on all 9 non-gbrain hosts. For gbrain host, skills get brain-first lookup and
save-to-brain behavior.
slop:diff in /review — every code review now runs
bun run slop:diffas advisorydiagnostic, catching AI code quality issues before they land.
Karpathy compatibility — README positions gstack as the workflow enforcement layer
for Karpathy-style CLAUDE.md rules (17K stars).
Skill improvements — CEO review HARD GATE at 12 STOP points, office-hours design
doc path visibility, investigate investigation learnings, retro non-git context.
Native OpenClaw skills mirrored.
Infrastructure — host count 8→10, GBRAIN suppression on all hosts, dead code
cleanup (openclaw adapter removal), golden fixture updates.
Test Coverage
737 tests pass, 0 failures. Changes are markdown templates + TypeScript configs.
No new application codepaths — coverage audit: N/A (template/config changes).
Pre-Landing Review
No issues found. All changes are TypeScript host configs, markdown templates,
resolver functions, and documentation.
Adversarial Review
Claude subagent: 6 findings (setup auto-detect, gbrain fallback, vendoring paths,
retro-context size, slop error handling, adapter removal). All informational or
pre-existing patterns.
Codex: 3 P1s (setup auto-detect mismatch, gbrain query shell injection concern,
auto-save sensitivity), 3 P2s (spawned session deadlock, slop committed-only,
npx timeout). P1s assessed as: intentional design (setup), instructional prose
not shell execution (query), and early-stage acceptable risk (sensitivity).
GATE: PASS
TODOS
No TODO items completed or created in this PR.
Test plan
bun test— 737 pass, 0 failbun run gen:skill-docs --host gbrain— generates brain-aware variantsbun run gen:skill-docs --host hermes— generates Hermes variants🤖 Generated with Claude Code