Skip to content

Phase 3 W4: applied-demo runner (cloudflare/dmv)#33

Merged
djscruggs merged 1 commit into
phase-3-w3-sd-uifrom
phase-3-w4-demo-runner
Jun 7, 2026
Merged

Phase 3 W4: applied-demo runner (cloudflare/dmv)#33
djscruggs merged 1 commit into
phase-3-w3-sd-uifrom
phase-3-w4-demo-runner

Conversation

@djscruggs

Copy link
Copy Markdown
Collaborator

Stacks on #32 (W3). Base is phase-3-w3-sd-ui; merge #32 first.

What (W4 of the web-demo plan)

Run the applied demos in the browser.

Server (2 new endpoints):

  • GET /api/demos — list the applied demos (cloudflare, dmv).
  • POST /api/run-demo/:name — run the genuine runAgent loop over a real scenario and the fail-closed tools, returning the tool-call trace, the authoritative tool decisions (captured server-side via onToolResult), and the agent's final text.

The model is a deterministic mock that issues the demo's tool sequence — offline, no key, safe for a public site. A live model stays a local opt-in (mock-only public, per the plan); the eval suites already cover misbehaving-model cases.

Client: an Applied demos tab showing the ordered tool calls beside the tool decisions, making the core property visible — the model orchestrates, the tools decide; the verdict is never the model's prose.

Tests

  • demoRunner.test.js — 5 integration cases via app.inject: lists demos, runs DMV (granted via register_vehicle, real confirmation number), runs Cloudflare (verify → stage → approve → cutover authorized), 404 for unknown demo, and a key-material canary over the whole trace.
  • Full gate green: npm run typecheck + npm run lint clean; web suite 28 passing; client builds clean. Live-smoke-tested: DMV runs, granted, no key leak.

W5 (polish + deploy doc + README "Try it") is the last slice.

🤖 Generated with Claude Code

W4 of the web demo. Add /api/demos and /api/run-demo/:name, which run the
real demo-agent loop over a real scenario and the genuine fail-closed tools
(Cloudflare migration, CA DMV registration), returning the tool-call trace
and the authoritative tool decisions. The model is a deterministic mock by
default — offline, no key, safe for a public site; a live model stays a
local opt-in, and the eval suites cover the misbehaving-model cases.

The client gains an Applied demos tab showing the ordered tool calls next to
the tool decisions, making the security property visible: the model
orchestrates, but the tools decide — the verdict is never the model's prose.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@djscruggs djscruggs merged commit ab88ed3 into phase-3-w3-sd-ui Jun 7, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant