Skip to content

fix(ce-sessions): unblock session-history on Claude Code#800

Merged
tmchow merged 2 commits into
mainfrom
tmchow/debug-issue-794
May 8, 2026
Merged

fix(ce-sessions): unblock session-history on Claude Code#800
tmchow merged 2 commits into
mainfrom
tmchow/debug-issue-794

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented May 8, 2026

Summary

Session-history features (/ce-sessions and /ce-compound Phase 1) work on Claude Code again. Previously, the historian agent's first action was a Skill tool call to fetch session inventory, but Claude Code does not permit subagents to invoke the Skill tool (anthropics/claude-code#38719), so the spinner hung at Initializing… indefinitely and orchestrators received a spurious "user doesn't want to proceed" rejection.

The fix is structural: orchestration moves to the ce-sessions skill (main context, where the Skill tool works), and the historian becomes synthesis-only. It receives pre-extracted file paths in its dispatch prompt and reads them via the native file-read tool. No subagent ever invokes Skill again.

What changed

  • ce-sessions is the canonical entry point. It owns the full pipeline: discovery, branch + keyword filtering, scan-window selection, top-5 deep-dive cap, scratch directory, per-session extraction, and synthesis dispatch. Scripts that were spread across ce-session-inventory/ and ce-session-extract/ now live in one home at plugins/compound-engineering/skills/ce-sessions/scripts/.
  • ce-session-historian is synthesis-only. Receives {problem_topic, scratch_dir, sessions, output_schema}, reads the path files via the native file-read tool, returns prose findings. No Skill calls, no Bash discovery, no orchestration logic.
  • ce-compound Phase 1 delegates via the platform's skill-invocation primitive in semantic-prose form (per ce-plan/references/plan-handoff.md line 57), not a literal Skill(...) tool-call expression. The literal form would propagate Claude-Code-specific syntax to Codex, Cursor, Gemini, OpenCode, Pi, and Kiro when the skill ships verbatim through the converters. Dispatch ordering is pinned: launch the three background research subagents first, then invoke ce-sessions, so wall-clock parallelism is preserved.
  • Extract scripts gain --output PATH. When set, scripts write to file and emit only a one-line JSON status to stdout. Extraction content (~50 KB+ per session × 5 sessions) never round-trips through orchestrator tool results. Stdout-mode behavior preserved when the flag is omitted.
  • Two callerless skills removed. ce-session-inventory and ce-session-extract were user-invocable: false script holders. With their callers gone, they're deleted and registered in STALE_SKILL_DIRS, LEGACY_ONLY_SKILL_DESCRIPTIONS, and EXTRA_LEGACY_ARTIFACTS_BY_PLUGIN so existing flat-installs sweep on upgrade.
  • Regression test asserts the agent body never instructs Skill(ce-session-inventory), Skill(ce-session-extract), or any literal Skill(...) tool-call expression.

Why this shape

Issue #794 proposed two narrower fixes: refactor the agent to invoke scripts directly via Bash from subagent context, or have the orchestrator pre-fetch inventory and pass it into the subagent's dispatch prompt. Both leave Skill calls in subagent context (per-session extraction in Option B remains a Skill round-trip; Option A trips on the same script-path-resolution problem ce-sessions navigates, but agents have no sibling-scripts/ convention to lean on). This refactor moves every Skill call to main context, eliminating the deadlock structurally rather than working around it on a per-call basis. Full design rationale, alternatives considered, and implementation units are in docs/plans/2026-05-08-001-fix-ce-sessions-orchestration-refactor-plan.md.

Test plan

  • 1337 bun tests pass (3 new --output PATH tests on the extract scripts; 4 new regression tests for the no-Skill-from-subagent invariant).
  • bun run release:validate passes (49 agents, 37 skills after the two deletions).
  • Manual smoke test on a marketplace-cached install (not --plugin-dir): /ce-sessions "what did we work on this week" and /ce-compound Full mode with session history opted in both complete without Initializing… hangs. The plan's Risks table flags an empirical verification step in case the bare bash scripts/... runtime invocation doesn't resolve from the skill directory; established slash-command precedents (ce-clean-gone-branches, ce-resolve-pr-feedback, ce-optimize) argue it does, but verifying on a real install before merge is cheap.

Post-Deploy Monitoring & Validation

  • Watch for any new bug reports referencing Initializing… hangs on /ce-sessions or /ce-compound. The fix should eliminate them. If they recur, the architecture's central assumption (skill-invocation primitive works from inside an executing skill body in main context) is wrong and needs revisiting.
  • Monitor for cross-platform regressions when the plugin converts and ships to Codex, Cursor, Gemini, OpenCode, Pi, and Kiro. The semantic-prose invocation form should round-trip cleanly through every converter; if a target's converter doesn't recognize the prose pattern, ce-compound's Phase 1 would silently skip session-history enrichment.
  • Validation window: through next plugin release. Owner: plugin maintainers.

Closes #794.


Compound Engineering
Claude Code

@tmchow
Copy link
Copy Markdown
Collaborator Author

tmchow commented May 8, 2026

@digitalcostas wanna test this?

…ynthesis-only

The ce-session-historian agent deadlocked when dispatched as a subagent in
Claude Code because its first action was Skill(ce-session-inventory), and
subagents cannot invoke the Skill tool (anthropics/claude-code#38719). The
spinner hung at "Initializing…" indefinitely; after timeout the orchestrator
received a spurious "user doesn't want to proceed" rejection.

The fix removes every code path that has a subagent calling Skill:

- Move 4 extraction scripts into plugins/compound-engineering/skills/ce-sessions/scripts/
  (single home; ce-session-inventory and ce-session-extract skills deleted)
- Rewrite ce-sessions/SKILL.md as the full orchestrator: discovery, branch +
  keyword filtering, scan-window selection, top-5 deep-dive cap, mktemp
  scratch dir, per-session extraction with new --output PATH flag (extraction
  bytes write directly to scratch, never round-trip through main-context tool
  results), dispatch of synthesis-only historian
- Reshape ce-session-historian.agent.md to synthesis-only: receives file
  paths in dispatch prompt, reads via native file-read tool, returns prose
  findings. No Skill calls, no Bash discovery, no orchestration logic
- Update ce-compound Phase 1 to delegate to ce-sessions via the platform's
  skill-invocation primitive (semantic-prose form per plan-handoff.md line 57
  convention, not literal Skill(...) syntax). Specifies dispatch ordering so
  the parallel research subagents and ce-sessions still run concurrently —
  wall-clock parallelism preserved
- Add --output PATH to extract-skeleton.py and extract-errors.py: when set,
  scripts write to file and emit only a one-line JSON status to stdout.
  Stdout-mode behavior preserved when omitted (additive API change)
- Add regression test (tests/skills/ce-session-historian-no-skill-tool.test.ts)
  asserting the agent body never instructs Skill(ce-session-inventory),
  Skill(ce-session-extract), or any literal Skill(...) tool-call expression
- Register ce-session-inventory and ce-session-extract in legacy-cleanup
  lookups (STALE_SKILL_DIRS, LEGACY_ONLY_SKILL_DESCRIPTIONS, and
  EXTRA_LEGACY_ARTIFACTS_BY_PLUGIN) so existing flat-installs sweep on upgrade
- Fix broken See Also links in docs/skills/ce-sessions.md

The bug is structurally gone: no subagent in the post-refactor flow ever
invokes the Skill tool. Plan with full design rationale, alternatives
considered (including issue #794's Options 1 and 2), and implementation
units lives at docs/plans/2026-05-08-001-fix-ce-sessions-orchestration-refactor-plan.md.

All 1337 bun tests pass; bun run release:validate passes (37 skills,
49 agents). Closes #794.
@digitalcostas
Copy link
Copy Markdown

Tested via marketplace-cached install of tmchow/debug-issue-794 on Claude Code, macOS Darwin 25.4.0. Plugin v3.7.0 / gitCommitSha: dd25db036c908b597b60e3cb242bec20abb9048a confirmed in ~/.claude/plugins/installed_plugins.json.

Install path

/plugin marketplace add https://github.com/EveryInc/compound-engineering-plugin.git#tmchow/debug-issue-794
/plugin install compound-engineering@compound-engineering-plugin

This routes through ~/.claude/plugins/cache/ like a published release — exercising the residual risk you flagged in the test plan, not just the diff. Filesystem verified: ce-session-inventory/ and ce-session-extract/ are absent from the skills cache; ce-sessions/scripts/ contains all 4 scripts.

Test 1: /ce-sessions "what did we work on this week"

✅ End-to-end success. 100 sessions inventoried (0 parse errors). 5 selected by size+recency+branch diversity, 3 returned substantive skeletons via the new --output PATH flag (status JSON to stdout, content to $SCRATCH/<id>.skeleton.txt). ce-session-historian subagent dispatched on sonnet, completed in 134s with 10 tool_uses (all file-reads, zero Skill calls in subagent context). Returned proper 4-section synthesis. No Initializing… hang.

The bare bash scripts/discover-sessions.sh runtime invocation resolved correctly from the marketplace-cached skill base dir — the residual risk you flagged in the test plan is empirically clear.

Test 2: /ce-compound Full mode + session-history opt-in

✅ End-to-end success on the path that previously deadlocked. The pinned dispatch ordering works as intended — background research subagents launched first (Context Analyzer 20s, Solution Extractor 51s, Related Docs Finder 91s), then Skill(ce-sessions) invoked from main context (~75s including historian dispatch on 4 sessions). Wall-clock for Phase 1 = max(~91s, ~75s) ≈ 91s, not 237s sequential — the parallelism preservation works.

End-to-end the skill produced a clean overlap-detected update to an existing learning doc, ran validate-frontmatter.py (exit=0), and completed Phase 2.5 + Discoverability without issue.

Separate pre-existing bug found during testing

Filed as #805extract-skeleton.py crashes with TypeError: unhashable type: 'slice' on one of five keyword-matched sessions due to a dict[:80] slice. Pre-existing (the consolidation moved the file, didn't re-author the code path) and doesn't block this merge.

Summary

LGTM from the original reporter. PR #800 fixes the deadlock structurally on Claude Code via marketplace-cached install. Will retire CLAUDE.md Rule 6 (the "answer no" workaround in our project rules) once plugin v3.7.0+ ships from a stable release.

Closes #794 confirmed working on this side.

summarize_claude_tool sliced inp.get("query", "") and inp.get("prompt", "")
unconditionally. When MCP or specialized tools put a dict in those fields,
dict[:80] raises TypeError: unhashable type: 'slice' and the per-session
extraction silently fails. Same exposure existed in handle_cursor's
tool_use path.

Add a _safe_slice helper and reroute every potentially-non-string field
through it, then add regression tests for dict-shaped query, command,
prompt, pattern, fall-through to a later string field, and the cursor path.

Fixes #805
@tmchow tmchow merged commit 81710ef into main May 8, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request May 8, 2026
LLMpsycho pushed a commit to LLMpsycho/compound-engineering-plugin that referenced this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ce-session-historian deadlocks under Claude Code: subagent cannot invoke Skill(ce-session-inventory) per anthropics/claude-code#38719

2 participants