diff --git a/plugins/compound-engineering/agents/ce-code-quality-per-task-reviewer.agent.md b/plugins/compound-engineering/agents/ce-code-quality-per-task-reviewer.agent.md
new file mode 100644
index 000000000..10fa0947d
--- /dev/null
+++ b/plugins/compound-engineering/agents/ce-code-quality-per-task-reviewer.agent.md
@@ -0,0 +1,66 @@
+---
+name: ce-code-quality-per-task-reviewer
+description: Conditional code-review persona, selected during subagent-driven execution after spec compliance passes. Reviews each subagent's output for code quality, test quality, and maintainability.
+model: inherit
+tools: Read, Grep, Glob, Bash
+color: blue
+---
+
+# Code Quality Per-Task Reviewer
+
+You are a code quality expert who reviews individual subagent task output for cleanliness, test quality, and maintainability. You are dispatched after each subagent task passes spec-compliance review.
+
+**Core principle:** Per-task quality catches drift early. Issues fixed per-task are cheaper than issues found at PR time.
+
+**Scope:** You review only the changes from this specific task — not the entire codebase or full PR diff. Stay focused and fast.
+
+## What You're Hunting For
+
+1. **Code cleanliness** — Are names clear and accurate? Is the code readable? Any unnecessary complexity, dead code, or debug artifacts (console.log, TODO comments, commented-out code)?
+
+2. **Test quality** — Do tests verify real behavior, not mock behavior? Is each test minimal and focused on one thing? Are test names descriptive of the behavior being tested? Are mocks used appropriately (see testing anti-patterns)?
+
+3. **Maintainability** — Does each file have one clear responsibility? Is the implementation following existing codebase patterns? Would a new team member understand this code?
+
+4. **YAGNI violations** — Did the implementer build beyond what the task specified? Unnecessary abstractions, premature generalization, unused parameters or options?
+
+5. **File organization** — Is the implementation following the file structure from the plan? Did the change create files that are already large, or significantly grow existing files?
+
+## Confidence Calibration
+
+- **Report with HIGH confidence** when you can point to specific code that is clearly wrong, confusing, or violates an established pattern
+- **Report with MODERATE confidence** for improvements that would meaningfully reduce future maintenance burden
+- **Do not report** subjective style preferences, alternative approaches that are equally valid, or pre-existing issues in untouched code
+
+## What You Don't Flag
+
+- Spec compliance issues (that was the previous reviewer's job)
+- Pre-existing code quality issues in files the implementer didn't meaningfully change
+- Style preferences not grounded in readability or maintainability concerns
+- Performance optimizations unless the code has an obvious algorithmic issue (O(n^2) where O(n) is trivial)
+
+## Severity Levels
+
+- **Critical** — Will cause bugs, data loss, or security issues. Blocks task completion.
+- **Important** — Meaningfully hurts maintainability or violates established patterns. Should be fixed before proceeding.
+- **Minor** — Small improvements. Note for the implementer but don't block.
+
+## Output Format
+
+```json
+{
+ "verdict": "APPROVED" | "CHANGES_REQUESTED",
+ "strengths": ["What the implementer did well"],
+ "findings": [
+ {
+ "severity": "critical" | "important" | "minor",
+ "description": "What's wrong",
+ "evidence": "file:line reference",
+ "suggestion": "How to fix"
+ }
+ ],
+ "summary": "One-line assessment"
+}
+```
+
+Only critical and important findings block task completion. If the verdict is CHANGES_REQUESTED, the implementer must fix the issues and you must re-review.
diff --git a/plugins/compound-engineering/agents/ce-security-sentinel.agent.md b/plugins/compound-engineering/agents/ce-security-sentinel.agent.md
index 3a395ea80..0061c8c58 100644
--- a/plugins/compound-engineering/agents/ce-security-sentinel.agent.md
+++ b/plugins/compound-engineering/agents/ce-security-sentinel.agent.md
@@ -91,4 +91,58 @@ Your security reports will include:
- Mass assignment vulnerabilities
- Unsafe redirects
+## STRIDE Threat Modeling
+
+In addition to the OWASP checks above, analyze the code through the STRIDE threat model. For each category, identify concrete threats specific to the code being reviewed.
+
+### Spoofing (Identity)
+- Can an attacker impersonate a legitimate user or service?
+- Are authentication tokens properly validated (signature, expiry, issuer)?
+- Are webhook signatures verified before processing payloads?
+- Can API keys be reused across environments or services?
+- Are there endpoints that trust caller identity without verification?
+
+### Tampering (Data Integrity)
+- Can request data be modified in transit or at rest?
+- Are critical fields (prices, quantities, permissions) validated server-side, not just client-side?
+- Are database writes protected by transactions where atomicity matters?
+- Can an attacker modify configuration or environment variables at runtime?
+- Are file uploads validated for type, size, and content (not just extension)?
+
+### Repudiation (Audit Trail)
+- Are security-relevant actions logged (login, permission changes, data access, admin operations)?
+- Do logs include enough context to reconstruct what happened (who, what, when, from where)?
+- Are logs tamper-resistant (not writable by the application user)?
+- Can a user deny performing an action because it was not recorded?
+
+### Information Disclosure
+- Do error responses leak internal details (stack traces, SQL errors, file paths, server versions)?
+- Are API responses filtered to return only the fields the requester is authorized to see?
+- Are secrets, tokens, or PII visible in logs, URLs, or client-side code?
+- Are debug endpoints or admin panels accessible in production?
+- Does the application expose internal service topology through headers or error messages?
+
+### Denial of Service
+- Are there rate limits on authentication endpoints, API calls, and resource-intensive operations?
+- Can a single request trigger unbounded computation (regex, recursion, large file processing)?
+- Are database queries bounded (pagination, LIMIT clauses, timeout)?
+- Can an attacker exhaust connection pools, file descriptors, or memory?
+- Are WebSocket connections limited per client?
+
+### Elevation of Privilege
+- Can a regular user access admin-only endpoints or operations?
+- Are role checks enforced at the data layer, not just the UI or routing layer?
+- Can a user modify their own role or permissions through API manipulation?
+- Are there IDOR (Insecure Direct Object Reference) vulnerabilities where changing an ID grants access to another user's data?
+- Are default accounts or roles overly permissive?
+
+## STRIDE Reporting
+
+When reporting STRIDE findings, include:
+- **Threat category** (e.g., "STRIDE: Elevation of Privilege")
+- **Severity** (Critical, High, Medium, Low)
+- **Specific code location** (file:line)
+- **Attack scenario** (how an attacker would exploit this)
+- **Remediation** (concrete fix, not generic advice)
+
You are the last line of defense. Be thorough, be paranoid, and leave no stone unturned in your quest to secure the application.
diff --git a/plugins/compound-engineering/agents/ce-spec-compliance-reviewer.agent.md b/plugins/compound-engineering/agents/ce-spec-compliance-reviewer.agent.md
new file mode 100644
index 000000000..6a526d0a0
--- /dev/null
+++ b/plugins/compound-engineering/agents/ce-spec-compliance-reviewer.agent.md
@@ -0,0 +1,74 @@
+---
+name: ce-spec-compliance-reviewer
+description: Conditional code-review persona, selected during subagent-driven execution. Verifies each subagent's output matches the plan spec — with explicit distrust of the implementer's self-report.
+model: inherit
+tools: Read, Grep, Glob, Bash
+color: blue
+---
+
+# Spec Compliance Reviewer
+
+You are a spec compliance expert who verifies that an implementer's output matches the plan specification. You are dispatched after each subagent task completes, before the code-quality review.
+
+**Core principle:** The implementer's self-report is not evidence. Read the actual code.
+
+## Your Posture
+
+The implementer finished suspiciously quickly. Their report may be incomplete, inaccurate, or optimistic. You MUST verify everything independently.
+
+**DO NOT:**
+- Take their word for what they implemented
+- Trust their claims about completeness
+- Accept their interpretation of requirements
+- Assume passing tests mean the spec is met
+
+**DO:**
+- Read the actual code they wrote
+- Compare actual implementation to the plan unit's requirements line by line
+- Check for missing pieces they claimed to implement
+- Look for extra features they didn't mention or the spec didn't request
+
+## What You're Hunting For
+
+1. **Missing requirements** — Did they implement everything the plan unit specified? Are there requirements they skipped, missed, or claimed to implement but didn't?
+
+2. **Extra/unneeded work** — Did they build things not requested? Over-engineer? Add "nice to haves" that weren't in the spec? Added features increase maintenance burden.
+
+3. **Misunderstandings** — Did they interpret requirements differently than intended? Solve the wrong problem? Implement the right feature but the wrong way?
+
+4. **Test coverage gaps** — Do the test scenarios from the plan unit have corresponding tests? Are there plan-specified edge cases without test coverage?
+
+5. **File list mismatch** — Were all files listed in the plan unit's `Files:` section actually touched? Were unexpected files modified?
+
+## Confidence Calibration
+
+- **Report with HIGH confidence** when you can point to a specific plan requirement and show it's missing from the code, or vice versa
+- **Report with MODERATE confidence** when the implementation seems to satisfy the requirement but through an unexpected approach that may not cover all cases
+- **Do not report** stylistic preferences, alternative approaches that would also satisfy the spec, or issues that belong in the code-quality review
+
+## What You Don't Flag
+
+- Code style or formatting (that's the code-quality reviewer's job)
+- Performance concerns (unless the plan explicitly specifies performance requirements)
+- Suggestions for improvement beyond the spec
+- Pre-existing code issues in files the implementer didn't change
+
+## Output Format
+
+```json
+{
+ "verdict": "PASS" | "FAIL",
+ "findings": [
+ {
+ "type": "missing_requirement" | "extra_work" | "misunderstanding" | "test_gap" | "file_mismatch",
+ "severity": "critical" | "important",
+ "description": "What's wrong",
+ "evidence": "file:line reference or specific code",
+ "plan_reference": "Which plan requirement this relates to"
+ }
+ ],
+ "summary": "One-line assessment"
+}
+```
+
+Only critical and important findings block task completion. If the verdict is FAIL, the implementer must fix the issues and you must re-review.
diff --git a/plugins/compound-engineering/skills/ce-debug/SKILL.md b/plugins/compound-engineering/skills/ce-debug/SKILL.md
index 15cd1efa8..e44ce1632 100644
--- a/plugins/compound-engineering/skills/ce-debug/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-debug/SKILL.md
@@ -179,7 +179,8 @@ If the user chose "Diagnosis only" at the end of Phase 2, skip this phase and go
2. Verify it fails for the right reason — the root cause, not unrelated setup
3. Implement the minimal fix — address the root cause and nothing else
4. Verify the test passes
-5. Run the broader test suite for regressions
+5. **Revert-and-verify-failure:** Revert the fix, run the test — it MUST fail (proves the test actually catches the bug, not a false positive). Restore the fix, run the test — it MUST pass again. If the test passes with the fix reverted, the test is a false positive — rewrite it. See `references/verification-discipline.md` in `ce-work` for the full pattern.
+6. Run the broader test suite for regressions
**3 failed fix attempts = smart escalation.** Diagnose using the same table from Phase 2. If fixes keep failing, the root cause identification was likely wrong. Return to Phase 2.
diff --git a/plugins/compound-engineering/skills/ce-security-audit/SKILL.md b/plugins/compound-engineering/skills/ce-security-audit/SKILL.md
new file mode 100644
index 000000000..5e9e34fb7
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-security-audit/SKILL.md
@@ -0,0 +1,93 @@
+---
+name: ce-security-audit
+description: "Run an on-demand security audit using OWASP Top 10 and STRIDE threat modeling. Use when you want a quick security check without a full /ce-review, before deploying security-sensitive changes, or when touching auth, payments, user data, or API endpoints."
+argument-hint: "[directory path, 'pr', 'diff', or 'full' for full codebase scan]"
+---
+
+# Security Audit
+
+Run a focused security audit on the specified scope. This skill dispatches CE's existing security review agents (`ce-security-reviewer` and `ce-security-sentinel`) in parallel and combines their findings into a single report.
+
+Unlike `/ce-review` (which runs 17+ reviewers across all concerns), this skill runs **only** the security agents — faster and more focused.
+
+## Input
+
+ #$ARGUMENTS
+
+## Determine Scope
+
+Based on the input:
+
+| Input | Scope | How to gather files |
+|-------|-------|-------------------|
+| Directory path (e.g., `src/auth/`) | All files in that directory | `Glob` for source files in the path |
+| `pr` or `diff` | Changed files in current branch vs main | `git diff --name-only origin/main...HEAD` |
+| `full` | Entire codebase | All source files (exclude node_modules, dist, vendor) |
+| Empty/no input | Default to `diff` (current branch changes) | Same as `pr` |
+
+## Execution
+
+1. **Gather the file list** based on scope above
+2. **Read the changed/target files** to build the review context
+3. **Dispatch two agents in parallel:**
+
+ **Agent 1: `ce-security-reviewer`**
+ - Attacker-mindset review
+ - Focus: injection vectors, auth bypass, secrets in code, SSRF, path traversal
+
+ **Agent 2: `ce-security-sentinel`**
+ - Checklist-driven audit
+ - Focus: OWASP Top 10 compliance + STRIDE threat modeling (Spoofing, Tampering, Repudiation, Information Disclosure, DoS, Elevation of Privilege)
+
+ Provide each agent with:
+ - The file contents or diff
+ - The scope description (what area of the codebase this covers)
+ - Any relevant context from CLAUDE.md about the project's auth, payment, or data handling patterns
+
+4. **Wait for both agents to complete**
+5. **Automated vulnerability scan (optional):**
+ If ruflo-security-audit MCP tools are available (check for `mcp__claude-flow__security_scan` in available tools):
+ - Run `mcp__claude-flow__security_scan` with `--depth full` on the scoped files
+ - This adds CVE detection, shell injection scanning, and secrets-at-rest checks that the prompt-based agents above cannot perform
+ - Merge ruflo findings with CE agent findings. Deduplicate: same file:line from both sources → keep the more detailed finding
+ - If ruflo is not installed, skip this step — CE agents provide sufficient coverage for threat modeling and OWASP
+6. **Combine findings** into a single report
+
+## Output Format
+
+Present a combined security report:
+
+### Summary
+- Total findings by severity (Critical / High / Medium / Low)
+- Overall risk assessment (one sentence)
+
+### Findings
+For each finding (sorted by severity, then by category):
+
+| # | Severity | Category | File:Line | Description | Remediation |
+|---|----------|----------|-----------|-------------|-------------|
+| 1 | Critical | OWASP A01 | `src/auth/login.js:45` | Missing authorization check on admin endpoint | Add role verification middleware |
+| 2 | High | STRIDE: Spoofing | `src/webhooks/handler.js:12` | Webhook signature not verified | Validate HMAC signature before processing |
+
+### Clean Areas
+Note areas that were reviewed and found clean — this provides confidence, not just a list of problems.
+
+## Error Handling
+
+- If one agent fails to dispatch, report findings from the other and note the failure
+- If no files match the scope, report "No files found for the specified scope" and suggest alternatives
+- If the scope is very large (>100 files), warn about token cost and ask whether to proceed or narrow the scope
+
+## When to Use This Skill
+
+- Before deploying changes that touch auth, payments, user data, or API endpoints
+- When adding new endpoints or modifying access control
+- After a security incident to audit related code
+- As a quick check during development — faster than a full `/ce-review`
+- When onboarding to unfamiliar code that handles sensitive operations
+
+## What This Skill Does NOT Do
+
+- Does not replace static analysis tools (Snyk, SonarQube, npm audit) — though ruflo-security-audit adds partial CVE coverage when available
+- Does not run penetration tests or active exploitation
+- Does not modify code — report only
diff --git a/plugins/compound-engineering/skills/ce-test-browser/SKILL.md b/plugins/compound-engineering/skills/ce-test-browser/SKILL.md
index 591027102..49fb2bb53 100644
--- a/plugins/compound-engineering/skills/ce-test-browser/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-test-browser/SKILL.md
@@ -34,6 +34,10 @@ command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTAL
If not installed, inform the user: "`agent-browser` is not installed. Run `/ce-setup` to install required dependencies." Then stop — this skill cannot function without agent-browser.
+## Framework-Specific Guides
+
+When testing single-spa micro frontend applications, read `references/single-spa-guide.md` for mount detection, cross-app navigation, auth flow, and WebSocket update patterns. When interacting with Element UI components, read `references/element-ui-selectors.md` for teleported component selectors and multi-step interaction patterns.
+
## Workflow
### 1. Verify Installation
diff --git a/plugins/compound-engineering/skills/ce-test-browser/references/element-ui-selectors.md b/plugins/compound-engineering/skills/ce-test-browser/references/element-ui-selectors.md
new file mode 100644
index 000000000..217752dff
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-test-browser/references/element-ui-selectors.md
@@ -0,0 +1,161 @@
+# Element UI Component Selectors & Interaction Patterns
+
+Load this reference when interacting with Element UI `~2.13.2` components via `agent-browser`. Element UI teleports many component overlays to `document.body`, outside the Vue app's mount container. This guide covers the selectors and multi-step interaction patterns needed.
+
+**Version:** These patterns are tested against Element UI `~2.13.2`. Other versions may use different class names or DOM structures.
+
+## Key Concept: Teleported Components
+
+Element UI renders dropdown menus, dialogs, popovers, and date pickers as children of `document.body`, not inside the component that triggered them. This means:
+
+- The trigger element (button, input) is inside your app's mount container (`#messaging`, `#multichannel-sender`)
+- The overlay content (dropdown options, dialog body, picker panel) is a sibling of `
`, outside your app
+- `agent-browser snapshot -i` will show both — look for elements near the bottom of the snapshot that aren't inside your app container
+
+## el-select (Dropdown Select)
+
+**DOM structure:**
+- Trigger: `.el-select` container with `.el-input` inside
+- Dropdown: `.el-select-dropdown` teleported to `body`, contains `.el-select-dropdown__item` elements
+
+**Interaction pattern:**
+```
+1. agent-browser click @select-trigger # Click the el-select input to open
+2. agent-browser wait .el-select-dropdown # Wait for dropdown to appear in body
+3. agent-browser snapshot -i # Find the option refs
+4. agent-browser click @target-option # Click the desired option
+5. agent-browser wait 500 # Allow selection to register
+```
+
+**Finding options:** After step 2, use `agent-browser snapshot -i` to see the dropdown items with their `@eN` refs. Options are `.el-select-dropdown__item` elements. The selected option has class `selected`.
+
+**Multi-select:** For `el-select` with `multiple` attribute, clicking an option toggles it without closing the dropdown. Click outside or press Escape to close.
+
+## el-dialog (Modal Dialog)
+
+**DOM structure:**
+- Wrapper: `.el-dialog__wrapper` teleported to `body`
+- Dialog: `.el-dialog` inside the wrapper
+- Header: `.el-dialog__header` with `.el-dialog__title`
+- Body: `.el-dialog__body`
+- Footer: `.el-dialog__footer` with action buttons
+- Overlay: `.v-modal` backdrop
+
+**Interaction pattern:**
+```
+1. agent-browser click @trigger-button # Click whatever opens the dialog
+2. agent-browser wait .el-dialog__wrapper # Wait for dialog wrapper
+3. agent-browser snapshot -i # Find form fields and buttons
+4. # Interact with dialog content (fill forms, click buttons)
+5. agent-browser click @confirm-button # Click confirm/submit in footer
+6. agent-browser wait 500 # Allow dialog to close
+```
+
+**Closing:** Click the X button (`.el-dialog__headerbtn`), click a footer button, or click the overlay (if `close-on-click-modal` is true, which is the default).
+
+**Nested dialogs:** Element UI supports nested dialogs. Each gets its own `.el-dialog__wrapper` in body. Use `agent-browser snapshot -i` to distinguish between them.
+
+## el-date-picker
+
+**DOM structure:**
+- Trigger: `.el-date-editor` input
+- Panel: `.el-picker-panel` teleported to `body`
+- Navigation: `.el-date-picker__header` with prev/next month buttons
+- Date cells: `.el-date-table` with `td.available` cells
+- Today: `td.today`
+- Selected: `td.current`
+
+**Interaction pattern (select a specific date):**
+```
+1. agent-browser click @date-input # Click to open picker
+2. agent-browser wait .el-picker-panel # Wait for panel
+3. agent-browser snapshot -i # See the calendar
+4. # Navigate months if needed:
+5. agent-browser click @next-month-button # .el-icon-arrow-right in header
+6. agent-browser wait 300 # Allow month transition
+7. agent-browser click @target-date-cell # Click the date cell
+8. agent-browser wait 500 # Allow picker to close
+```
+
+**Date range:** For range pickers, the panel shows two months side by side. Click the start date, then the end date.
+
+**Quick tip:** Use `agent-browser snapshot -i` after opening the picker to find the exact refs for date cells. Each `td` in the date table is interactive.
+
+## el-popover
+
+**DOM structure:**
+- Trigger: the element with `v-popover` directive
+- Content: `.el-popover` teleported to `body`
+- Arrow: `.popper__arrow`
+
+**Interaction pattern:**
+```
+1. agent-browser click @popover-trigger # Click or hover to show
+2. agent-browser wait .el-popover # Wait for popover
+3. agent-browser snapshot -i # Find content refs
+4. # Interact with popover content
+```
+
+**Trigger mode:** Popovers can be triggered by `click`, `hover`, or `focus`. With `agent-browser`, always use click — hover events are unreliable.
+
+## el-message-box (Confirm/Alert/Prompt)
+
+**DOM structure:**
+- Wrapper: `.el-message-box__wrapper` teleported to `body`
+- Box: `.el-message-box`
+- Title: `.el-message-box__title`
+- Message: `.el-message-box__message`
+- Input: `.el-message-box__input` (for prompt type)
+- Buttons: `.el-message-box__btns` with cancel and confirm
+
+**Interaction pattern:**
+```
+1. # Message box appears after an action
+2. agent-browser wait .el-message-box__wrapper
+3. agent-browser snapshot -i # See title, message, buttons
+4. agent-browser click @confirm-button # Or @cancel-button
+```
+
+## el-table
+
+**DOM structure:**
+- Container: `.el-table`
+- Header: `.el-table__header-wrapper`
+- Body: `.el-table__body-wrapper` with `tr` rows and `td` cells
+- Fixed columns: `.el-table__fixed` (if present)
+
+**Interaction pattern:**
+```
+1. agent-browser wait .el-table__body-wrapper # Wait for table to render
+2. agent-browser snapshot -i # Find row/cell refs
+3. agent-browser click @target-row # Click a row (if clickable)
+```
+
+**Pagination:** If the table has pagination (`.el-pagination`), use `agent-browser click` on page numbers or next/prev buttons.
+
+## General Tips
+
+### Discovering selectors
+When you don't know the exact selector:
+```
+agent-browser snapshot -i # Shows all interactive elements with @eN refs
+```
+
+### Waiting for animations
+Element UI components have transition animations (default 300ms). After opening/closing overlays, wait at least 300-500ms before interacting with the next element.
+
+### Z-index stacking
+When multiple overlays are open (dialog + popover, or nested dialogs), the latest one has the highest z-index. `agent-browser snapshot -i` shows elements in DOM order, not z-index order — the last overlay in the list is typically the topmost.
+
+### Hidden elements
+If an element exists in the DOM but isn't visible (display: none, visibility: hidden), `agent-browser click` will fail. Use `agent-browser snapshot -i` to check visibility.
+
+## Troubleshooting
+
+| Symptom | Likely Cause | Fix |
+|---------|-------------|-----|
+| Dropdown doesn't appear after clicking select | Another overlay is blocking it | Close other overlays first |
+| Can't find dropdown options | They're teleported to body, not inside the select | Use `snapshot -i` to find them at the bottom of the element list |
+| Dialog close button doesn't work | Clicking the overlay instead of the X | Use specific `.el-dialog__headerbtn` selector |
+| Date picker shows wrong month | Default is current month | Use nav buttons to reach target month before selecting |
+| Table rows not clickable | Table doesn't have row-click handler | Check if the table has `@row-click` or specific cell buttons |
diff --git a/plugins/compound-engineering/skills/ce-test-browser/references/single-spa-guide.md b/plugins/compound-engineering/skills/ce-test-browser/references/single-spa-guide.md
new file mode 100644
index 000000000..c48c3a3f5
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-test-browser/references/single-spa-guide.md
@@ -0,0 +1,122 @@
+# Single-SPA Micro Frontend Testing Guide
+
+Load this reference when testing single-spa micro frontend applications with `agent-browser`. Covers mount detection, cross-app navigation, auth flows, and WebSocket-driven update patterns.
+
+## Known Limitations
+
+`agent-browser` has **no JavaScript evaluation capability**. All "framework awareness" in this guide uses DOM-based workarounds:
+
+- **Mount detection** = waiting for a known DOM element to appear (not detecting the single-spa mount event)
+- **WebSocket awareness** = polling for UI changes (not hooking into socket events)
+- **State checking** = reading visible DOM content (not querying Vuex store)
+
+These workarounds are reliable for testing but cannot detect invisible failures (e.g., socket event received but UI didn't update).
+
+## App Topology
+
+| App | Base Path | Port (dev) | Mount Element | AMD Output |
+|-----|-----------|-----------|---------------|------------|
+| Main Shell | `/` | — | `#app` | orchestrator |
+| Messaging | `/messaging-portal` | 8240 | `#messaging` | `messaging.js` |
+| MCS | `/multichannel-portal` | 8243 | `#multichannel-sender` | `sender.js` |
+| Automation | `/automation` | — | TBD | React + Bun |
+
+## Mount Detection
+
+Single-spa apps mount/unmount based on route. After navigating to an app's route, wait for its mount element to have child content before interacting.
+
+**Pattern:**
+```
+1. agent-browser navigate /messaging-portal
+2. agent-browser wait #messaging # Wait for mount element
+3. agent-browser snapshot -i # Verify app content is loaded
+4. # Now safe to interact with the app
+```
+
+**Why not just `wait` for the mount element?** The `#messaging` div may exist in the HTML before the app mounts (it's a static container). Wait for a child element that only appears after the Vue app renders — e.g., a navigation bar, a specific component, or any content inside the container.
+
+**Better pattern:**
+```
+1. agent-browser navigate /messaging-portal
+2. agent-browser wait .messaging-sidebar # Wait for a child that proves the app mounted
+3. agent-browser snapshot -i
+```
+
+## Cross-App Navigation
+
+When navigating between micro frontends, the current app unmounts and the new app mounts. This takes time.
+
+**Pattern:**
+```
+1. # Currently in Messaging at /messaging-portal
+2. agent-browser navigate /multichannel-portal
+3. agent-browser wait 2000 # Allow unmount/mount cycle
+4. agent-browser wait #multichannel-sender # Wait for MCS mount
+5. agent-browser snapshot -i # Verify MCS is loaded
+```
+
+**Common mistake:** Interacting with elements immediately after navigation. The old app's DOM may still be present during the unmount/mount transition.
+
+## Auth Flow
+
+### Prerequisites
+- Shell app running (serves the login page at `/`)
+- Test credentials in environment variables:
+ - `TEST_USER_EMAIL` — test account email
+ - `TEST_USER_PASSWORD` — test account password
+- Descope auth service reachable
+
+### Login Pattern
+```
+1. agent-browser navigate /
+2. agent-browser wait [data-testid="login-form"] # Or the actual login form selector
+3. agent-browser fill @email-input $TEST_USER_EMAIL
+4. agent-browser fill @password-input $TEST_USER_PASSWORD
+5. agent-browser click @login-button
+6. agent-browser wait 3000 # Allow auth redirect
+7. agent-browser wait .main-shell-content # Verify logged-in state
+```
+
+### Session Persistence
+After login through the shell app, session cookies are set in the browser. Navigating to micro frontend routes (`/messaging-portal`, `/multichannel-portal`) carries the auth state automatically — no need to re-login.
+
+### Error Handling
+If login fails:
+1. **Check env vars:** Are `TEST_USER_EMAIL` and `TEST_USER_PASSWORD` set?
+2. **Check Descope:** Is the auth service reachable? (may not be available in local dev)
+3. **Check 2FA:** Does the test account require two-factor authentication? If so, it cannot be automated with `agent-browser`.
+4. **Check rate limiting:** Descope may rate-limit login attempts. Wait and retry.
+5. **Take a screenshot:** `agent-browser screenshot --full` to see what the login page shows.
+
+## WebSocket-Driven Updates
+
+Our apps use a Vuex socket module for real-time updates (new messages, status changes). Since `agent-browser` cannot hook into WebSocket events, use poll-and-wait patterns.
+
+**Pattern: Wait for a message to appear**
+```
+1. # Trigger the action that should produce a WebSocket event
+2. # (e.g., send a message via API, or click send in the UI)
+3. agent-browser wait .message-list-item:last-child # Wait for new DOM element
+4. agent-browser snapshot -i # Verify content
+```
+
+**Pattern: Wait for a status change**
+```
+1. # Trigger status change
+2. agent-browser wait [data-status="active"] # Wait for attribute change
+3. agent-browser snapshot -i
+```
+
+**Timeout guidance:** If the expected UI change doesn't appear within 10 seconds, the event likely didn't arrive or the UI didn't update. Take a screenshot and report the failure rather than waiting indefinitely.
+
+**What you can't detect:** If a WebSocket event arrives but the UI handler has a bug and doesn't update the DOM, the wait will time out. This is a real limitation — report it as "expected UI change did not appear within timeout" and let the developer investigate.
+
+## Troubleshooting
+
+| Symptom | Likely Cause | Fix |
+|---------|-------------|-----|
+| Mount element exists but app content doesn't load | App hasn't finished mounting | Wait for a child element, not just the container |
+| Login succeeds but micro frontend shows "unauthorized" | Session cookie not set correctly | Check if the shell app and micro frontend are on the same domain |
+| Elements not found after navigation | Old app's DOM is still present during transition | Add a delay before waiting for the new app's elements |
+| WebSocket updates don't appear | Socket not connected, or event handler bug | Check if the dev server's WebSocket endpoint is running |
+| `agent-browser wait` times out | Element selector is wrong, or the element is inside a shadow DOM | Use `agent-browser snapshot -i` to inspect available elements |
diff --git a/plugins/compound-engineering/skills/ce-work/SKILL.md b/plugins/compound-engineering/skills/ce-work/SKILL.md
index 72340ff75..bc2b0689d 100644
--- a/plugins/compound-engineering/skills/ce-work/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-work/SKILL.md
@@ -20,6 +20,12 @@ This command takes a work document (plan or specification) or a bare prompt desc
### Phase 0: Input Triage
+**Session state check (MANDATORY — runs before input triage):**
+1. Run: check if `SESSION_STATE.md` exists in the project root (use Bash: `test -f SESSION_STATE.md && echo EXISTS || echo NONE`)
+2. If EXISTS: read `SESSION_STATE.md`, then read `references/session-state.md` for resume behavior. Present the state to the user and ask: "Found session state — resume from here, or start fresh?" Do NOT proceed to input triage until the user responds.
+3. If NONE: proceed to input triage below.
+4. **Ruflo memory enrichment (optional):** After the SESSION_STATE.md check, if `mcp__claude-flow__agentdb_health` is available in your tools, read `references/ruflo-memory-integration.md` and follow its session-start steps to query for related past sessions. If ruflo is not available, skip this — SESSION_STATE.md is sufficient.
+
Determine how to proceed based on what was provided in ``.
**Plan document** (input is a file path to an existing plan or specification) → skip to Phase 1.
@@ -161,12 +167,15 @@ Determine how to proceed based on what was provided in ``.
**Permission mode:** Omit the `mode` parameter when dispatching subagents so the user's configured permission settings apply. Do not pass `mode: "auto"` — it overrides user-level settings like `bypassPermissions`.
+ **Subagent prompt and status handling:** Read `references/subagent-templates.md` for the implementer prompt template, status vocabulary (DONE/DONE_WITH_CONCERNS/NEEDS_CONTEXT/BLOCKED), escalation decision trees, and model-tier routing guidance. When a subagent returns NEEDS_CONTEXT, provide the missing context and re-dispatch. When BLOCKED, follow the escalation decision tree (context problem → re-dispatch, reasoning limit → upgrade model, task too large → split, plan wrong → escalate to user).
+
**After each subagent completes (serial mode):**
1. Review the subagent's diff — verify changes match the unit's scope and `Files:` list
- 2. Run the relevant test suite to confirm the tree is healthy
- 3. If tests fail, diagnose and fix before proceeding — do not dispatch dependent units on a broken tree
- 4. Update the task list (do not edit the plan body — progress is carried by the commit)
- 5. Dispatch the next unit
+ 2. Run per-task review: dispatch `ce-spec-compliance-reviewer` and `ce-code-quality-per-task-reviewer` agents on the subagent's changes. Spec-compliance runs first; code-quality only after spec-compliance passes. If either reviewer raises critical or important issues, the implementer fixes them and the reviewer re-reviews. Repeat until both approve. See `references/subagent-templates.md` for the full per-task review pipeline.
+ 3. Run the relevant test suite to confirm the tree is healthy
+ 4. If tests fail, diagnose and fix before proceeding — do not dispatch dependent units on a broken tree
+ 5. Update the task list (do not edit the plan body — progress is carried by the commit)
+ 6. Dispatch the next unit
**After all parallel subagents in a batch complete (worktree-isolated mode):**
1. Wait for every subagent in the current parallel batch to finish.
@@ -207,6 +216,7 @@ Determine how to proceed based on what was provided in ``.
- Run tests after changes
- Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
- Mark task as completed
+ - Update `SESSION_STATE.md` with current task progress (see `references/session-state.md` for format). Orchestrator-level only — never from inside subagents.
- Evaluate for incremental commit (see below)
```
@@ -217,6 +227,7 @@ Determine how to proceed based on what was provided in ``.
- Do not skip verifying that a new test fails before implementing the fix or feature
- Do not over-implement beyond the current behavior slice when working test-first
- Skip test-first discipline for trivial renames, pure configuration, and pure styling work
+ - When working test-first, also read `references/tdd-guardrails.md` for rationalization defenses, red-flag detection, and the delete-and-restart rule. Read `references/testing-anti-patterns.md` for common testing pitfalls to avoid.
**Test Discovery** — Before implementing changes to a file, find its existing test files (search for test/spec files that import, reference, or share naming patterns with the implementation file). When a plan specifies test scenarios or test files, start there, then check for additional test coverage the plan may not have enumerated. Changes to implementation files should be accompanied by corresponding test updates — new tests for new behavior, modified tests for changed behavior, removed or updated tests for deleted behavior.
@@ -321,7 +332,9 @@ Determine how to proceed based on what was provided in ``.
### Phase 3-4: Quality Check and Finishing Work
-When all Phase 2 tasks are complete and execution transitions to quality check, you must read `references/shipping-workflow.md` for the full shipping workflow.Do not skip this.
+When all Phase 2 tasks are complete and execution transitions to quality check, you must read `references/shipping-workflow.md` for the full shipping workflow. Do not skip this. Also read `references/verification-discipline.md` for per-message verification freshness, claim-to-evidence mapping, and linguistic red-flag detection before making any completion claims.
+
+**Trajectory capture (after shipping):** If the execution involved a non-obvious approach — an initial attempt that failed, an unexpected dependency order, or a workaround — read `references/trajectory-capture.md` and write a trajectory doc. Skip this for routine work where the plan was followed with no surprises.
## Key Principles
diff --git a/plugins/compound-engineering/skills/ce-work/references/ruflo-memory-integration.md b/plugins/compound-engineering/skills/ce-work/references/ruflo-memory-integration.md
new file mode 100644
index 000000000..5b0fbc98c
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-work/references/ruflo-memory-integration.md
@@ -0,0 +1,59 @@
+# Ruflo Memory Integration (Optional)
+
+Load this reference when `/ce-work` detects that ruflo-agentdb MCP tools are available. This enhances session resume with semantic recall from past sessions and stores task completion trajectories for future retrieval.
+
+This is an **optional enhancement** — if ruflo is not installed or the MCP server is not running, SESSION_STATE.md provides full session state persistence on its own.
+
+## Detection
+
+Before using any ruflo tools, check availability:
+
+1. Check if the tool `mcp__claude-flow__agentdb_health` exists in your available tools
+2. If it does not exist, skip all ruflo integration — SESSION_STATE.md is sufficient
+3. If it exists, call `mcp__claude-flow__agentdb_health` — if it returns an error, skip ruflo integration
+
+## At Session Start (after reading SESSION_STATE.md)
+
+After the SESSION_STATE.md check in Phase 0, if ruflo-agentdb is available:
+
+1. Extract the plan goal or work description from SESSION_STATE.md (or from the bare prompt if no state file exists)
+2. Call `mcp__claude-flow__agentdb_pattern-search` with the goal/description as the query
+3. If relevant past patterns are found, present them briefly:
+ > "Found [N] related past sessions in AgentDB. Key learnings: [one-line summary per pattern]"
+4. Do not block on this — if the search is slow or returns nothing, proceed normally
+5. This supplements SESSION_STATE.md, never replaces it. SESSION_STATE.md has the authoritative task progress; agentdb has cross-session context.
+
+## At Task Completion
+
+After updating SESSION_STATE.md at a task boundary, if ruflo-agentdb is available:
+
+1. Store a task summary to agentdb:
+ ```
+ Tool: mcp__claude-flow__agentdb_hierarchical-store
+ Args:
+ key: "{branch}/{plan-filename}/{unit-id}"
+ value: "Goal: {unit goal}. Approach: {what was done}. Outcome: {success/failure/partial}."
+ namespace: "ce-task-completions"
+ ```
+
+2. If the task involved a non-obvious solution (unexpected approach, workaround, or recovery from a failed first attempt), also store the pattern:
+ ```
+ Tool: mcp__claude-flow__agentdb_pattern-store
+ Args:
+ pattern: "{description of the approach that worked and why}"
+ namespace: "ce-patterns"
+ ```
+
+3. Keep storage lightweight — one call per task boundary, not per file change.
+
+## What NOT to Store
+
+- Routine task completions where the approach was obvious (followed existing pattern, no surprises)
+- File contents or diffs (too large, too noisy)
+- Temporary state that SESSION_STATE.md already captures (current task progress, blockers)
+
+## Failure Handling
+
+- If any ruflo MCP call fails, log the failure silently and continue — ruflo is a nice-to-have, not a dependency
+- Never block ce-work execution waiting for ruflo
+- Never retry failed ruflo calls — move on
diff --git a/plugins/compound-engineering/skills/ce-work/references/session-state.md b/plugins/compound-engineering/skills/ce-work/references/session-state.md
new file mode 100644
index 000000000..71fc484a9
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-work/references/session-state.md
@@ -0,0 +1,68 @@
+# Session State Persistence
+
+Load this reference when `/ce-work` detects a `SESSION_STATE.md` file in the project root, or when updating task progress at task boundaries.
+
+## Purpose
+
+Persist live work state across Claude Code sessions. When a session ends mid-work and a new session starts, the engineer can resume from where they left off instead of re-explaining context.
+
+This is NOT a replacement for MEMORY.md (which handles decisions and patterns) or `docs/solutions/` (which handles post-hoc learnings). SESSION_STATE.md captures **live progress** — what branch you're on, what plan you're executing, which tasks are done, and what's next.
+
+## SESSION_STATE.md Template
+
+```markdown
+# Session State
+
+**Updated:** 2026-04-21T14:30:00+08:00
+**Branch:** feat/OMG-1234-user-auth-flow
+**Plan:** docs/plans/2026-04-21-001-feat-user-auth-flow-plan.md
+
+## Task Progress
+
+- [x] Unit 1: Create auth service module
+- [x] Unit 2: Add login endpoint
+- [ ] Unit 3: Add session persistence ← in progress
+- [ ] Unit 4: Add logout
+- [ ] Unit 5: Add registration
+
+## Blockers
+
+- Waiting on Descope API key for staging environment (asked Anton 2026-04-21)
+
+## Next Steps
+
+- Complete Unit 3 session persistence
+- Then Unit 4 logout (depends on Unit 3)
+```
+
+## Read Behavior (Phase 0)
+
+When `/ce-work` starts:
+
+1. Check for `SESSION_STATE.md` in the project root
+2. If it does not exist — proceed normally, no action needed
+3. If it exists, read it and check the timestamp
+4. **Stale check:** If the `Updated` timestamp is older than 7 days, ask: "Found session state from [date] on branch [branch]. This is [N] days old. Resume from this state, or start fresh?" (Note: 7 days is a starting default — teams with longer branch lifetimes may want to adjust.)
+5. **Branch check:** If the state references a different branch than the current one, flag: "Session state is from branch [recorded branch] but you're on [current branch]. This state may be outdated."
+6. **Fresh state:** If the state is recent and matches the current branch, offer: "Found session state: [N] of [M] tasks complete on [plan name]. Resume from [next incomplete task]?"
+7. If the user chooses to resume, load the plan and skip to the first incomplete task
+8. If the user chooses to start fresh, proceed normally (the old state file will be overwritten as new tasks complete)
+
+## Write Behavior (Phase 2)
+
+Update `SESSION_STATE.md` at these checkpoints during `/ce-work`:
+
+1. **Task completion:** After marking a task as completed and before dispatching the next task, update the task checklist in SESSION_STATE.md
+2. **Blocker encountered:** When a task is blocked (waiting on external input, dependency not met), update the Blockers section
+3. **Plan change:** If scope changes during execution (new tasks added, tasks removed), update the Task Progress section
+
+**Critical: Orchestrator-level only.** Never update SESSION_STATE.md from inside a subagent. The orchestrating ce-work session handles all writes. This prevents concurrent write conflicts during parallel subagent execution.
+
+**Token cost:** Each update is one `Write` call with ~10-20 lines of markdown. This is lightweight. The agent uses judgment on frequency — don't update after trivial steps (renaming a variable), do update after meaningful task boundaries.
+
+## File Location and Lifecycle
+
+- **Location:** Project root (alongside `CLAUDE.md`)
+- **Git:** Add `SESSION_STATE.md` to `.gitignore` — each developer's state is personal
+- **Cleanup:** When all tasks in a plan are complete, the state file can be deleted or left to be overwritten by the next plan execution
+- **Format:** Plain markdown, human-readable and editable. An engineer can manually update it if the agent's version gets out of sync
diff --git a/plugins/compound-engineering/skills/ce-work/references/subagent-templates.md b/plugins/compound-engineering/skills/ce-work/references/subagent-templates.md
new file mode 100644
index 000000000..4eec02d4a
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-work/references/subagent-templates.md
@@ -0,0 +1,152 @@
+# Subagent Orchestration Templates
+
+Load this reference when dispatching subagents for plan execution. It provides the implementer prompt template, status vocabulary with escalation decision trees, and model-tier routing guidance.
+
+Adapted from [Superpowers](https://github.com/obra/superpowers) `subagent-driven-development` skill.
+
+## Implementer Prompt Template
+
+When dispatching a subagent for a plan task, use this template structure:
+
+```
+You are implementing Task N: [task name]
+
+## Task Description
+
+[FULL TEXT of the implementation unit from the plan — paste it here, don't make the subagent read the file]
+
+## Context
+
+[Scene-setting: where this fits in the larger plan, what was completed before, architectural context]
+
+## Before You Begin
+
+If you have questions about the requirements, approach, dependencies, or anything unclear — ask them now. Raise concerns before starting work. It's always OK to pause and clarify. Don't guess.
+
+## Your Job
+
+1. Implement exactly what the task specifies
+2. Write tests following TDD (write failing test first, then minimal code to pass)
+3. Verify implementation works
+4. Commit your work
+5. Self-review (see below)
+6. Report back with status
+
+## Self-Review Before Reporting
+
+Review your work with fresh eyes:
+
+**Completeness:** Did I implement everything in the spec? Miss any requirements or edge cases?
+**Quality:** Is this my best work? Are names clear? Is the code clean and maintainable?
+**Discipline:** Did I avoid overbuilding (YAGNI)? Did I follow existing codebase patterns?
+**Testing:** Do tests verify behavior (not mock behavior)? Did I follow TDD? Are tests comprehensive?
+
+If you find issues during self-review, fix them before reporting.
+
+## When You're In Over Your Head
+
+It is always OK to stop and escalate. Bad work is worse than no work.
+
+STOP and escalate when:
+- The task requires architectural decisions with multiple valid approaches
+- You need to understand code beyond what was provided
+- You feel uncertain about whether your approach is correct
+- The task involves restructuring code the plan didn't anticipate
+
+## Report Format
+
+- **Status:** DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT
+- What you implemented (or attempted, if blocked)
+- What you tested and test results
+- Files changed
+- Self-review findings (if any)
+- Issues or concerns
+```
+
+## Status Vocabulary
+
+Subagents report one of four statuses. Handle each with the decision tree below.
+
+### DONE
+
+Subagent completed the task successfully. Proceed to per-task review (spec-compliance, then code-quality).
+
+### DONE_WITH_CONCERNS
+
+Subagent completed the work but flagged doubts.
+
+**Decision tree:**
+- Read the concerns before proceeding
+- If concerns are about **correctness or scope** → address them before review
+- If concerns are **observations** (e.g., "this file is getting large") → note them and proceed to review
+
+### NEEDS_CONTEXT
+
+Subagent needs information that wasn't provided.
+
+**Decision tree:**
+- Read what context is missing
+- Provide the missing context (file contents, architectural decisions, API details)
+- Re-dispatch the same subagent with the additional context
+
+### BLOCKED
+
+Subagent cannot complete the task.
+
+**Decision tree:**
+1. **Context problem** → Provide more context and re-dispatch with the same model
+2. **Reasoning limit** → Re-dispatch with a more capable model
+3. **Task too large** → Break into smaller pieces and dispatch separately
+4. **Plan is wrong** → Escalate to the user — the plan needs revision
+
+**Never** ignore an escalation or force the same model to retry without changes. If the subagent said it's stuck, something needs to change.
+
+## Model-Tier Routing
+
+Use the least powerful model that can handle each role to conserve cost and increase speed.
+
+| Task Type | Model Tier | Signals |
+|-----------|-----------|---------|
+| **Mechanical implementation** | Fast/cheap | Touches 1-2 files, clear spec, isolated function, well-defined inputs/outputs |
+| **Integration and judgment** | Standard | Touches multiple files, pattern matching, coordination between components, debugging |
+| **Architecture, design, review** | Most capable | Requires design judgment, broad codebase understanding, review quality assessment |
+
+**Heuristic:** If the plan unit has a complete spec with exact file paths, test scenarios, and patterns to follow — it's mechanical. If it requires the agent to make design decisions — use a more capable model.
+
+### Model Selection Scoring
+
+When the task type isn't immediately obvious, score these signals to decide:
+
+| Signal | Fast/cheap (Haiku) | Standard (Sonnet) | Most capable (Opus) |
+|--------|-------------------|-------------------|---------------------|
+| **File count** | 1-2 files | 3-8 files | 9+ files |
+| **Test complexity** | Unit tests only | Integration tests | Cross-service or E2E tests |
+| **Domain** | Config, styling, renaming, docs | Business logic, API endpoints, UI components | Auth, payments, migrations, data integrity |
+| **Pattern availability** | Exact pattern exists to copy | Similar pattern exists to adapt | Novel implementation required |
+| **Execution note** | None or "trivial" | Standard | "Complex", "cross-cutting", or security-related |
+| **Error handling** | No failure modes | Known failure modes | Distributed failures, partial rollback |
+
+**Scoring:** Default to Standard (Sonnet). Upgrade to Most capable (Opus) when 2+ signals point to it. Downgrade to Fast/cheap (Haiku) when all signals point to mechanical work with an exact pattern to follow.
+
+## Per-Task Review Pipeline
+
+After each subagent completes with DONE status:
+
+1. **Spec-compliance review** — Dispatch the `ce-spec-compliance-reviewer` agent. Does the output match the plan unit's Goal, Files, Approach, and Test scenarios? The reviewer explicitly distrusts the implementer's self-report and verifies by reading actual code.
+
+2. **Code-quality review** — Only after spec-compliance passes. Dispatch the `ce-code-quality-per-task-reviewer` agent. Is the code clean, tested, and maintainable?
+
+3. **Fix-and-re-review loop** — If either reviewer raises critical issues, the implementer fixes them and the reviewer re-reviews. Repeat until approved.
+
+**Important:** Do not start code-quality review before spec-compliance passes. Wrong order wastes review effort on code that doesn't meet the spec.
+
+## Red Flags
+
+- Dispatching multiple implementation subagents in parallel on overlapping files (conflicts)
+- Making subagent read the plan file (provide full text instead)
+- Skipping scene-setting context (subagent needs to know where the task fits)
+- Ignoring subagent questions (answer before letting them proceed)
+- Accepting "close enough" on spec compliance (issues found = not done)
+- Skipping re-review after fixes (reviewer found issues → implementer fixes → review again)
+- Letting implementer self-review replace actual review (both are needed)
+- Moving to next task while either review has open issues
diff --git a/plugins/compound-engineering/skills/ce-work/references/tdd-guardrails.md b/plugins/compound-engineering/skills/ce-work/references/tdd-guardrails.md
new file mode 100644
index 000000000..f36da1cb8
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-work/references/tdd-guardrails.md
@@ -0,0 +1,151 @@
+# TDD Guardrails
+
+Load this reference when working test-first. It provides rationalization defenses, red-flag detection, and the delete-and-restart rule to prevent the agent from cutting corners on TDD discipline.
+
+Adapted from [Superpowers](https://github.com/obra/superpowers) `test-driven-development` skill.
+
+## The Iron Law
+
+```
+NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+```
+
+Write code before the test? **Delete it. Start over.**
+
+- Don't keep it as "reference"
+- Don't "adapt" it while writing tests
+- Don't look at it
+- Delete means delete
+
+Implement fresh from tests. Period.
+
+**Exception:** Pure configuration, styling, and documentation work are exempt from test-first (per CE's existing pragmatic rule). This guardrail applies to code that has behavior.
+
+## Red-Green-Refactor
+
+### RED — Write Failing Test
+
+Write one minimal test showing what should happen.
+
+**Good test:**
+```typescript
+test('retries failed operations 3 times', async () => {
+ let attempts = 0;
+ const operation = () => {
+ attempts++;
+ if (attempts < 3) throw new Error('fail');
+ return 'success';
+ };
+
+ const result = await retryOperation(operation);
+
+ expect(result).toBe('success');
+ expect(attempts).toBe(3);
+});
+```
+Clear name, tests real behavior, one thing.
+
+**Bad test:**
+```typescript
+test('retry works', async () => {
+ const mock = jest.fn()
+ .mockRejectedValueOnce(new Error())
+ .mockRejectedValueOnce(new Error())
+ .mockResolvedValueOnce('success');
+ await retryOperation(mock);
+ expect(mock).toHaveBeenCalledTimes(3);
+});
+```
+Vague name, tests mock not code.
+
+### Verify RED — Watch It Fail (MANDATORY)
+
+Run the test. Confirm:
+- Test fails (not errors)
+- Failure message is expected
+- Fails because feature missing (not typos)
+
+Test passes? You're testing existing behavior. Fix the test.
+
+### GREEN — Minimal Code
+
+Write the simplest code to pass the test. Don't add features, refactor other code, or "improve" beyond the test.
+
+### Verify GREEN — Watch It Pass (MANDATORY)
+
+Run the test. Confirm all tests pass. Other tests still pass. Output is clean.
+
+### REFACTOR — Clean Up (After Green Only)
+
+Remove duplication, improve names, extract helpers. Keep tests green. Don't add behavior.
+
+## Good Tests
+
+| Quality | Good | Bad |
+|---------|------|-----|
+| **Minimal** | One thing. "and" in name? Split it. | `test('validates email and domain and whitespace')` |
+| **Clear** | Name describes behavior | `test('test1')` |
+| **Shows intent** | Demonstrates desired API | Obscures what code should do |
+| **Real code** | Tests actual implementation | Tests mock behavior |
+
+## Common Rationalizations
+
+These are the excuses the agent generates to skip TDD. Each one is wrong.
+
+| Rationalization | Rebuttal |
+|----------------|----------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+| "Already manually tested" | Ad-hoc is not systematic. No record, can't re-run. |
+| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
+| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
+| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
+| "Test is hard = skip it" | Hard to test = hard to use. Listen to the test. |
+| "TDD will slow me down" | TDD is faster than debugging. Pragmatic = test-first. |
+| "Manual test is faster" | Manual doesn't prove edge cases. You'll re-test every change. |
+| "Existing code has no tests" | You're improving it. Add tests for the code you're changing. |
+
+## Red Flags — STOP and Start Over
+
+If you catch yourself thinking any of these, **delete the code and start with a failing test:**
+
+- Code before test
+- Test after implementation
+- Test passes immediately (without failing first)
+- Can't explain why the test failed
+- Tests added "later"
+- Rationalizing "just this once"
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "It's about spirit not ritual"
+- "Keep as reference" or "adapt existing code"
+- "Already spent X hours, deleting is wasteful"
+- "TDD is dogmatic, I'm being pragmatic"
+- "This is different because..."
+
+**All of these mean: Delete code. Start over with TDD.**
+
+## Verification Checklist
+
+Before marking any unit complete:
+
+- [ ] Every new function/method has a test
+- [ ] Watched each test fail before implementing
+- [ ] Each test failed for expected reason (feature missing, not typo)
+- [ ] Wrote minimal code to pass each test
+- [ ] All tests pass
+- [ ] Output is clean (no errors, warnings)
+- [ ] Tests use real code (mocks only if unavoidable)
+- [ ] Edge cases and error paths covered
+
+Can't check all boxes? You skipped TDD. Start over.
+
+## When Stuck
+
+| Problem | Solution |
+|---------|----------|
+| Don't know how to test | Write wished-for API. Write assertion first. Ask the user. |
+| Test too complicated | Design too complicated. Simplify interface. |
+| Must mock everything | Code too coupled. Use dependency injection. |
+| Test setup huge | Extract helpers. Still complex? Simplify design. |
diff --git a/plugins/compound-engineering/skills/ce-work/references/testing-anti-patterns.md b/plugins/compound-engineering/skills/ce-work/references/testing-anti-patterns.md
new file mode 100644
index 000000000..b099e141b
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-work/references/testing-anti-patterns.md
@@ -0,0 +1,119 @@
+# Testing Anti-Patterns
+
+Load this reference when writing or changing tests, adding mocks, or tempted to add test-only methods to production code.
+
+Adapted from [Superpowers](https://github.com/obra/superpowers) `testing-anti-patterns` reference.
+
+## The Iron Laws
+
+```
+1. NEVER test mock behavior
+2. NEVER add test-only methods to production classes
+3. NEVER mock without understanding dependencies
+```
+
+## Anti-Pattern 1: Testing Mock Behavior
+
+**The violation:** Asserting that a mock exists rather than testing real component behavior.
+
+```typescript
+// BAD: Testing that the mock exists
+test('renders sidebar', () => {
+ render();
+ expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
+});
+```
+
+**Why it's wrong:** You're verifying the mock works, not that the component works. Test passes when mock is present, tells you nothing about real behavior.
+
+**The fix:** Test real component behavior, or don't mock it.
+
+```typescript
+// GOOD: Test real component
+test('renders sidebar', () => {
+ render();
+ expect(screen.getByRole('navigation')).toBeInTheDocument();
+});
+```
+
+**Gate:** Before asserting on any mock element, ask: "Am I testing real behavior or just mock existence?" If mock existence — delete the assertion.
+
+## Anti-Pattern 2: Test-Only Methods in Production
+
+**The violation:** Adding methods to production classes that only tests call (e.g., `destroy()`, `reset()`, `_testHelper()`).
+
+**Why it's wrong:** Pollutes production code with test concerns. Dangerous if accidentally called in production. Confuses object lifecycle.
+
+**The fix:** Put test cleanup and helpers in test utility files, not production classes.
+
+**Gate:** Before adding any method to a production class, ask: "Is this only used by tests?" If yes — put it in test utilities instead.
+
+## Anti-Pattern 3: Mocking Without Understanding
+
+**The violation:** Over-mocking to "be safe" and accidentally removing behavior the test depends on.
+
+```typescript
+// BAD: Mock prevents config write that test depends on
+vi.mock('ToolCatalog', () => ({
+ discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
+}));
+// Test fails mysteriously because mocked method had a side effect
+```
+
+**Why it's wrong:** The mocked method had side effects the test depended on. Over-mocking breaks actual behavior.
+
+**The fix:** Mock at the correct level — mock the slow/external operation, not the high-level method.
+
+**Gate:** Before mocking any method:
+1. What side effects does the real method have?
+2. Does this test depend on any of those side effects?
+3. If yes — mock at a lower level that preserves necessary behavior
+
+Red flags: "I'll mock this to be safe", "This might be slow, better mock it", mocking without understanding the dependency chain.
+
+## Anti-Pattern 4: Incomplete Mocks
+
+**The violation:** Partial mocks that only include fields you think you need.
+
+```typescript
+// BAD: Missing metadata that downstream code uses
+const mockResponse = {
+ status: 'success',
+ data: { userId: '123', name: 'Alice' }
+ // Missing: metadata.requestId consumed downstream
+};
+```
+
+**Why it's wrong:** Partial mocks hide structural assumptions. Tests pass but integration fails.
+
+**The fix:** Mirror the complete real data structure.
+
+**Gate:** Before creating mock responses, check: "What fields does the real API response contain?" Include ALL fields the system might consume downstream.
+
+## Anti-Pattern 5: Integration Tests as Afterthought
+
+**The violation:** Claiming implementation is complete without writing tests.
+
+**Why it's wrong:** Testing is part of implementation, not optional follow-up. TDD prevents this entirely.
+
+**The fix:** Follow the TDD cycle. Tests come first, not after.
+
+## Quick Reference
+
+| Anti-Pattern | Fix |
+|--------------|-----|
+| Assert on mock elements | Test real component or unmock it |
+| Test-only methods in production | Move to test utilities |
+| Mock without understanding | Understand dependencies first, mock minimally |
+| Incomplete mocks | Mirror real API completely |
+| Tests as afterthought | TDD — tests first |
+| Over-complex mocks | Consider integration tests |
+
+## Red Flags
+
+- Assertion checks for `*-mock` test IDs
+- Methods only called in test files
+- Mock setup is >50% of test code
+- Test fails when you remove mock
+- Can't explain why mock is needed
+- Mocking "just to be safe"
diff --git a/plugins/compound-engineering/skills/ce-work/references/trajectory-capture.md b/plugins/compound-engineering/skills/ce-work/references/trajectory-capture.md
new file mode 100644
index 000000000..f31db3711
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-work/references/trajectory-capture.md
@@ -0,0 +1,65 @@
+# Trajectory Capture
+
+Load this reference after shipping a feature (Phase 3-4) when the execution involved a non-obvious approach — an initial attempt that failed, an unexpected dependency order, or a workaround for a framework limitation.
+
+## Purpose
+
+Capture execution trajectories as human-readable markdown so future sessions (and future engineers) can learn from what was tried and what worked. This is complementary to `docs/solutions/` (which captures post-hoc learnings about specific problems) — trajectories capture the **execution path**, not just the solution.
+
+## When to Capture
+
+Capture a trajectory when any of these are true:
+
+- The initial approach failed and a different one succeeded
+- The execution order mattered (doing X before Y prevented issues)
+- A framework limitation required a workaround
+- A plan assumption turned out to be wrong, requiring adaptation
+- The task took significantly longer than expected due to a non-obvious blocker
+
+Do NOT capture trajectories for routine work where the plan was followed directly with no surprises.
+
+## Format
+
+Write to `docs/solutions/{category}/` using the project's existing solution doc conventions. If the project has no `docs/solutions/` directory, write to the project root as a markdown file and let the engineer decide where to put it.
+
+```markdown
+---
+date: YYYY-MM-DD
+topic: {slug}
+category: {developer-experience|integration-issues|build-errors|database-issues}
+trajectory: true
+---
+
+# {Problem title}
+
+## What we were trying to do
+{Plan goal, unit being implemented, and expected approach}
+
+## What we tried first
+{Initial approach — what was done and what went wrong}
+{Be specific: error messages, unexpected behavior, the moment it became clear this wasn't working}
+
+## What worked
+{Final approach with enough detail to reproduce}
+{Include file paths, key code patterns, and configuration that mattered}
+
+## Why this order mattered
+{If execution sequence was critical, explain the dependency chain}
+{Example: "The migration had to run before the seed script because..."}
+
+## Key files
+{List the files that were central to the solution}
+
+## Time cost
+{Optional: how long the detour took, to calibrate future estimates}
+```
+
+## How This Gets Used
+
+- `ce-learnings-researcher.agent.md` searches `docs/solutions/` by frontmatter metadata — the `trajectory: true` field lets it specifically find execution trajectories
+- Future `/ce-work` sessions benefit when `ce-learnings-researcher` surfaces a relevant trajectory before implementation starts
+- If ruflo-agentdb is available, the trajectory summary is also stored as a pattern for semantic search (see `ruflo-memory-integration.md`)
+
+## Orchestrator Responsibility
+
+The orchestrating `/ce-work` session decides whether to capture a trajectory. Subagents do not write trajectories — they report their outcomes (including failures and pivots) to the orchestrator, which has the full picture of the execution path.
diff --git a/plugins/compound-engineering/skills/ce-work/references/verification-discipline.md b/plugins/compound-engineering/skills/ce-work/references/verification-discipline.md
new file mode 100644
index 000000000..73df15561
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-work/references/verification-discipline.md
@@ -0,0 +1,96 @@
+# Verification Discipline
+
+Load this reference during the shipping phase. It enforces evidence-based completion claims, prevents premature success declarations, and provides the revert-verify-failure pattern for regression tests.
+
+Adapted from [Superpowers](https://github.com/obra/superpowers) `verification-before-completion` skill.
+
+## The Iron Law
+
+```
+NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
+```
+
+If you haven't run the verification command **in this message**, you cannot claim it passes.
+
+## The Gate Function
+
+Before claiming any status or expressing satisfaction:
+
+1. **IDENTIFY** — What command proves this claim?
+2. **RUN** — Execute the full command (fresh, complete)
+3. **READ** — Full output, check exit code, count failures
+4. **VERIFY** — Does output confirm the claim?
+ - If NO: State actual status with evidence
+ - If YES: State claim WITH evidence
+5. **ONLY THEN** — Make the claim
+
+Skip any step = lying, not verifying.
+
+## Claim-to-Evidence Mapping
+
+| Claim | Requires | NOT Sufficient |
+|-------|----------|----------------|
+| "Tests pass" | Test command output: 0 failures | Previous run, "should pass" |
+| "Linter clean" | Linter output: 0 errors | Partial check, extrapolation |
+| "Build succeeds" | Build command: exit 0 | Linter passing, "logs look good" |
+| "Bug fixed" | Test original symptom: passes | "Code changed, assumed fixed" |
+| "Regression test works" | Red-green cycle verified | Test passes once |
+| "Agent completed" | VCS diff shows changes | Agent reports "success" |
+| "Requirements met" | Line-by-line checklist | "Tests passing" |
+
+## Linguistic Red Flags — STOP
+
+If you catch yourself using any of these phrases, STOP and run verification:
+
+- "should work now"
+- "probably fine"
+- "seems to work"
+- "looks correct"
+- "I'm confident this..."
+- Expressing satisfaction before verification ("Great!", "Perfect!", "Done!")
+- About to commit/push/PR without verification
+- Trusting agent success reports without independent verification
+- **ANY wording implying success without having run verification**
+
+## Rationalization Prevention
+
+| Rationalization | Rebuttal |
+|----------------|----------|
+| "Should work now" | RUN the verification |
+| "I'm confident" | Confidence is not evidence |
+| "Just this once" | No exceptions |
+| "Linter passed" | Linter is not a compiler |
+| "Agent said success" | Verify independently |
+| "I'm tired" | Exhaustion is not an excuse |
+| "Partial check is enough" | Partial proves nothing |
+| "Different words so rule doesn't apply" | Spirit over letter |
+
+## Revert-and-Verify-Failure Pattern
+
+For regression tests (bug fix TDD), prove the test is not a false positive:
+
+```
+1. Write regression test
+2. Run → MUST PASS (fix is in place)
+3. Revert the fix
+4. Run → MUST FAIL (proves test catches the bug)
+5. Restore the fix
+6. Run → MUST PASS again
+```
+
+If the test passes even with the fix reverted, the test is a false positive — it doesn't actually catch the bug. Rewrite the test.
+
+## When To Apply
+
+**ALWAYS before:**
+- Any variation of success or completion claims
+- Any expression of satisfaction about work state
+- Committing, PR creation, task completion
+- Moving to next task
+- Delegating to agents and trusting their reports
+
+**The rule applies to:**
+- Exact phrases
+- Paraphrases and synonyms
+- Implications of success
+- ANY communication suggesting completion or correctness