Skip to content

feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0)#1005

Open
garrytan wants to merge 12 commits intomainfrom
garrytan/gstacklite-split
Open

feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0)#1005
garrytan wants to merge 12 commits intomainfrom
garrytan/gstacklite-split

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

Agent runtime support + Karpathy-inspired guardrails + skill improvements.

Confusion Protocol — inline ambiguity gate in the preamble. When Claude hits a
decision that could go two ways (which architecture? which data model? destructive
operation with unclear scope?), it stops and asks instead of guessing. Scoped to
high-stakes decisions only. Addresses Karpathy failure mode #1 (wrong assumptions).

Hermes + GBrain host configs — two new hosts. Hermes gets tool rewrites for
terminal/read_file/patch/delegate_task. GBrain is a "mod" for gstack:
coding skills become brain-aware when installed, searching the brain for context
before starting and saving results after finishing.

GBrain resolverGBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS injected into
4 thinking skill templates (office-hours, investigate, ceo-review, retro). Suppressed
on all 9 non-gbrain hosts. For gbrain host, skills get brain-first lookup and
save-to-brain behavior.

slop:diff in /review — every code review now runs bun run slop:diff as advisory
diagnostic, catching AI code quality issues before they land.

Karpathy compatibility — README positions gstack as the workflow enforcement layer
for Karpathy-style CLAUDE.md rules (17K stars).

Skill improvements — CEO review HARD GATE at 12 STOP points, office-hours design
doc path visibility, investigate investigation learnings, retro non-git context.
Native OpenClaw skills mirrored.

Infrastructure — host count 8→10, GBRAIN suppression on all hosts, dead code
cleanup (openclaw adapter removal), golden fixture updates.

Test Coverage

737 tests pass, 0 failures. Changes are markdown templates + TypeScript configs.
No new application codepaths — coverage audit: N/A (template/config changes).

Pre-Landing Review

No issues found. All changes are TypeScript host configs, markdown templates,
resolver functions, and documentation.

Adversarial Review

Claude subagent: 6 findings (setup auto-detect, gbrain fallback, vendoring paths,
retro-context size, slop error handling, adapter removal). All informational or
pre-existing patterns.

Codex: 3 P1s (setup auto-detect mismatch, gbrain query shell injection concern,
auto-save sensitivity), 3 P2s (spawned session deadlock, slop committed-only,
npx timeout). P1s assessed as: intentional design (setup), instructional prose
not shell execution (query), and early-stage acceptable risk (sensitivity).

GATE: PASS

TODOS

No TODO items completed or created in this PR.

Test plan

  • bun test — 737 pass, 0 fail
  • bun run gen:skill-docs --host gbrain — generates brain-aware variants
  • bun run gen:skill-docs --host hermes — generates Hermes variants
  • Golden fixture diffs updated (claude, codex, factory ship SKILL.md)
  • Host count test updated (8→10)

🤖 Generated with Claude Code

garrytan and others added 11 commits April 14, 2026 10:52
Injects a high-stakes ambiguity gate at preamble tier >= 2 so all
workflow skills get it. Fires when Claude encounters architectural
decisions, data model changes, destructive operations, or contradictory
requirements. Does NOT fire on routine coding.

Addresses Karpathy failure mode #1 (wrong assumptions) with an
inline STOP gate instead of relying on workflow skill invocation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hermes: tool rewrites for terminal/read_file/patch/delegate_task,
paths to ~/.hermes/skills/gstack, AGENTS.md config file.

GBrain: coding skills become brain-aware when GBrain mod is installed.
Same tool rewrites as OpenClaw (agents spawn Claude Code via ACP).
GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS NOT suppressed on gbrain
host, enabling brain-first lookup and save-to-brain behavior.

Both registered in hosts/index.ts with setup script redirect messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New scripts/resolvers/gbrain.ts with two resolver functions:
- GBRAIN_CONTEXT_LOAD: search brain for context before skill starts
- GBRAIN_SAVE_RESULTS: save skill output to brain after completion

Placeholders added to 4 thinking skill templates (office-hours,
investigate, plan-ceo-review, retro). Resolves to empty string on
all hosts except gbrain via suppressedResolvers.

GBRAIN suppression added to all 9 non-gbrain host configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Step 3.5 to the review template: runs bun run slop:diff against
the base branch to catch AI code quality issues (empty catches,
redundant return await, overcomplicated abstractions). Advisory only,
never blocking. Skips silently if slop-scan is not installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Positions gstack as the workflow enforcement layer for Karpathy-style
CLAUDE.md rules (17K stars). Links to forrestchang/andrej-karpathy-skills.
Maps each Karpathy failure mode to the gstack skill that addresses it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
office-hours: add design doc path visibility message after writing
ceo-review: add HARD GATE reminder at review section transitions
retro: add non-git context support (check memory for meeting notes)

Mirrors template improvements to hand-crafted native skills.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Host count: 8 → 10 (hermes, gbrain)
- OpenClaw adapter test: expects undefined (dead code removed)
- Golden ship fixtures: updated with Confusion Protocol + vendoring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Regenerated from templates after Confusion Protocol, GBrain resolver
placeholders, slop:diff in review, HARD GATE reminders, investigation
learnings, design doc visibility, and retro non-git context changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CHANGELOG: add v0.18.0.0 entry (Confusion Protocol, Hermes, GBrain,
  slop in review, Karpathy note, skill improvements)
- CLAUDE.md: add hermes.ts and gbrain.ts to hosts listing
- README.md: update agent count 8→10, add Hermes + GBrain to table
- VERSION: bump to 0.18.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 15, 2026

E2E Evals: ✅ PASS

45/45 tests passed | $6.12 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 1/1 $0.04
e2e-deploy 5/5 $1.11
e2e-design 3/3 $0.45
e2e-plan 7/7 $1.18
e2e-qa-workflow 3/3 $1.16
e2e-review 6/6 $1.49
e2e-workflow 3/3 $0.35
llm-judge 17/17 $0.34

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

The review-base-branch E2E test was copying the full 1493-line
review/SKILL.md into the test fixture. The agent spent 8+ turns
reading it in chunks, leaving only 7 turns for actual work, causing
error_max_turns on every attempt.

Now extracts only Step 0 (base branch detection, ~50 lines) which is
all the test actually needs. Follows the CLAUDE.md rule: "NEVER copy
a full SKILL.md file into an E2E test fixture."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant