Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
dcdeafb
docs(cedar-hitl): restore and revise HITL gates design, fold adversar…
scoropeza May 7, 2026
1e2dd59
feat(cedar-hitl): pin Cedar engines and seed cross-engine parity cont…
scoropeza May 7, 2026
6689ee0
feat(cedar-hitl): three-outcome PolicyEngine core
scoropeza May 7, 2026
4c51a16
feat(cedar-hitl): approval milestone writers + engine counters
scoropeza May 7, 2026
3cd4b3b
feat(cedar-hitl): TaskApprovals + AWAITING_APPROVAL transition primit…
scoropeza May 7, 2026
7cb78b8
feat(cedar-hitl): PreToolUse three-outcome REQUIRE_APPROVAL path
scoropeza May 7, 2026
c149993
feat(cedar-hitl): TaskApprovalsTable + SlackUserMapping + status enum
scoropeza May 7, 2026
81f26a4
feat(cedar-hitl): Cedar-wasm layer + wire approval tables into agent …
scoropeza May 7, 2026
10a1ae0
feat(cedar-hitl): approve + deny handlers + shared types (§7.1, §7.2)
scoropeza May 7, 2026
e20a381
feat(cedar-hitl): get-pending + get-policies + link-slack-user handlers
scoropeza May 7, 2026
f980b57
feat(cedar-hitl): wire Chunk 5 routes + orchestrator + reconciler + a…
scoropeza May 7, 2026
571a88a
feat(cedar-hitl): Chunk 6 CLI — approve / deny / pending / policies
scoropeza May 7, 2026
274deff
feat(cedar-hitl): Chunk 7a — persist gate counter + IMPL-23 cache obs…
scoropeza May 7, 2026
04a3b8c
feat(cedar-hitl): Chunk 7b — persist approval_gate_cap from blueprint
scoropeza May 7, 2026
e85a450
feat(cedar-hitl): Chunk 7c — observability wrap-up for resolved cap +…
scoropeza May 8, 2026
3c889d6
feat(cedar-hitl): Chunk 8a — extend approval outcome event schema
scoropeza May 8, 2026
42b9cfd
feat(cedar-hitl): Chunk 8b — ApprovalMetricsPublisher + native CloudW…
scoropeza May 8, 2026
364d7b3
docs(cedar-hitl): Chunk 9 — sync design doc to Chunks 7b / 8a / 8b
scoropeza May 8, 2026
bea8582
feat(cedar-hitl): Chunk 10 review fixes — close 2 blockers + tighten …
scoropeza May 8, 2026
cb3a8a5
fix(cedar-hitl): suppress AwsSolutions-IAM5 on Runtime ExecutionRole …
scoropeza May 8, 2026
c95ca06
fix(cedar-hitl): E2E deploy-readiness — policies bundle + onboarding …
scoropeza May 11, 2026
c03b87d
fix(cedar-hitl): E2E approval-flow regressions + CLI parity for --app…
scoropeza May 12, 2026
0c964e7
feat(cedar-hitl): close seven follow-ups from the 2026-05-11/12 E2E pass
scoropeza May 12, 2026
98e893b
chore(cedar-hitl): merge upstream/main — Slack + Linear integrations
scoropeza May 13, 2026
7b7b889
docs(cedar-hitl): add user-facing HITL approval documentation
scoropeza May 13, 2026
df2219f
chore(cedar-hitl): merge upstream/main — fanout/SlackNotify refactor …
scoropeza May 14, 2026
20aa326
fix(cdk): suppress AwsSolutions-COG8 on UserPool
scoropeza May 15, 2026
29369d6
fix(cedar-hitl): close 3 critical PR-review findings
scoropeza May 15, 2026
fc5a687
fix(cedar-hitl): hook + state-machine bugs from PR review
scoropeza May 15, 2026
315dfe3
fix(cedar-hitl): observability + logging gaps from PR review
scoropeza May 15, 2026
7217672
refactor(cedar-hitl): dedup formatMinuteBucket + tighten GetPendingFn…
scoropeza May 15, 2026
8b950f8
refactor(cedar-hitl): type design — discriminated union + Severity al…
scoropeza May 15, 2026
e5ac531
chore(ci): add CDK ↔ CLI type sync drift check (S8)
scoropeza May 15, 2026
4c57a2a
test(cedar-hitl): cover DDB transactions, concurrent decisions, poll …
scoropeza May 15, 2026
a8b33ed
chore: post-review style + dependency fixups
scoropeza May 15, 2026
03e9b39
refactor(s9): hoist APPROVAL_GATE_CAP bounds to contracts/constants.json
scoropeza May 16, 2026
0a21652
Merge upstream/main into feature/cedar-hitl-engine
scoropeza May 16, 2026
80b02a5
Merge branch 'main' into feature/cedar-hitl-engine
krokoko May 17, 2026
805c472
Merge branch 'main' into feature/cedar-hitl-engine
krokoko May 18, 2026
297ff31
Merge branch 'main' into feature/cedar-hitl-engine
krokoko May 18, 2026
6a8ec95
fix(cedar-hitl): restore cedarpy 4.8.0 parity pin + add engine-bump g…
May 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Build context is repo root (see cdk/src/stacks/agent.ts) so the
# Dockerfile can COPY contracts/ alongside agent/. Exclusions below
# keep the context lean — without them the entire monorepo (CDK
# cdk.out/, node_modules/, docs/dist/, etc.) gets uploaded on every
# AgentCore deploy.

# CDK output (recursive include if not excluded)
cdk/cdk.out/
cdk/lib/
cdk/node_modules/

# CLI and docs build artifacts
cli/lib/
cli/node_modules/
docs/dist/
docs/node_modules/
docs/.astro/

# Shared node_modules
node_modules/

# Agent venv and cache (rebuilt inside image via uv)
agent/.venv/
agent/__pycache__/
agent/**/__pycache__/
agent/**/*.pyc

# Git and tooling
.git/
.prek/
.claude/
**/.DS_Store

# Docs and assets not needed in image
*.md
*.png
*.drawio
*.html
*.gif
*.tape

# Worktrees + scratch
abca-worktrees/
.next-session-prompt.md
.e2e-test-plan.md

# Test/coverage output
coverage/
**/coverage/
.pytest_cache/
**/.pytest_cache/

# IDE / OS
.idea/
.vscode/
yarn-error.log
yarn-debug.log
npm-debug.log*
16 changes: 16 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,22 @@ repos:
exclude: ^docs/node_modules/
stages: [pre-commit]

- id: types-sync-cdk-cli
name: type sync drift (CDK ↔ CLI)
entry: bash -lc 'cd "$(git rev-parse --show-toplevel)" && mise run check:types-sync'
language: system
pass_filenames: false
files: ^(cdk/src/handlers/shared/types\.ts$|cli/src/types\.ts$|scripts/check-types-sync\.ts$)
stages: [pre-commit]

- id: constants-sync
name: cross-language constants drift (contracts/constants.json)
entry: bash -lc 'cd "$(git rev-parse --show-toplevel)" && mise run check:constants-sync'
language: system
pass_filenames: false
files: ^(contracts/constants\.json$|agent/src/policy\.py$|cdk/src/handlers/shared/types\.ts$|cdk/src/constructs/blueprint\.ts$|scripts/check-constants-sync\.ts$)
stages: [pre-commit]

- id: monorepo-security-pre-push
name: security scans (pre-push)
entry: bash -lc 'cd "$(git rev-parse --show-toplevel)" && mise run hooks:pre-push:security'
Expand Down
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Handler entry tests: `cdk/test/handlers/orchestrate-task.test.ts`, `create-task.
- **`prek install`** fails if Git **`core.hooksPath`** is set — another hook manager owns hooks; see [CONTRIBUTING.md](./CONTRIBUTING.md).
- **Editing on `main` directly** — ALWAYS create a worktree with a feature branch for changes, even trivial ones. Main should stay clean; all work flows through worktree → branch → PR → merge.
- **Git worktrees** — Always **`git fetch origin main`** before creating a new worktree to ensure you branch from the latest remote state. `node_modules/` and `agent/.venv/` are per-tree (not shared). Run **`mise run install`** in each new worktree before building. All CDK path references (`__dirname`-relative) and mise `config_roots` resolve correctly without extra setup.
- **Bumping Cedar engines in isolation** — `cedarpy` (Python, `agent/pyproject.toml`) and `@cedar-policy/cedar-wasm` (TypeScript, `cdk/package.json`) are two language bindings over the same Cedar Rust core. They MUST move together; even patch-version drift between bindings can yield divergent `(decision, matching_rule_ids)` on the same `(policy, input)` — invisible to per-side unit tests, caught (only) by `contracts/cedar-parity/` golden fixtures in CI. If you bump one engine you MUST bump the other to a tested-compatible version AND refresh the parity fixtures in the same commit. Both pins are EXACT (no `^`/`~`). See `docs/design/CEDAR_HITL_GATES.md` §15.6 (decision #23) and the parity-contract banner in `mise.toml`. **DO NOT** accept upstream's "Update branch" or auto-merge suggestions on cedarpy without verifying parity with cedar-wasm.

### Tech stack

Expand Down
6 changes: 0 additions & 6 deletions agent/.dockerignore

This file was deleted.

32 changes: 25 additions & 7 deletions agent/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,19 @@ COPY --from=gh-builder /out/gh /usr/local/bin/gh
# - build-essential (native compilation for some repos)
# - curl (downloads)
RUN apt-get update && \
# Patch any base-image CVEs that have a fix available in the
# current Debian point release. Without this, transitive system-
# library CVEs (e.g. libnghttp2 CVE-2026-27135) ride the base
# ``python:3.13-slim`` tag until upstream rebuilds, which can be
# weeks. ``--no-install-recommends`` keeps the upgrade narrow and
# reproducible — only already-installed packages get bumped.
apt-get upgrade -y --no-install-recommends && \
apt-get install -y --no-install-recommends \
curl \
git \
build-essential \
ca-certificates \
gnupg && \
# Upgrade base image's CVE-2026-27135 vulnerability
apt-get upgrade -y --no-install-recommends libnghttp2-14 && \
# Cleanup early to keep peak disk usage low during builds.
apt-get clean && \
rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
Expand All @@ -49,15 +54,28 @@ RUN npm install -g npm@latest && \
# Install uv (fast Python package manager) — pinned for reproducibility
COPY --from=ghcr.io/astral-sh/uv:0.11.14 /uv /usr/local/bin/uv

# Install Python dependencies via uv
COPY pyproject.toml uv.lock /app/
# Install Python dependencies via uv. Build context is repo root (set in
# ``cdk/src/stacks/agent.ts``) so source paths are prefixed with ``agent/``.
COPY agent/pyproject.toml agent/uv.lock /app/
RUN uv sync --frozen --no-dev --directory /app

# Copy agent code (ARG busts cache so file edits are always picked up)
ARG CACHE_BUST=0
COPY src/ /app/src/
COPY prepare-commit-msg.sh /app/
COPY test_sdk_smoke.py test_subprocess_threading.py /app/
COPY agent/src/ /app/src/
# Cedar HITL built-in policy files (hard_deny.cedar + soft_deny.cedar).
# ``agent/src/policy.py::_POLICIES_DIR`` resolves to ``/app/policies``
# at import time; without these files the PolicyEngine init raises
# ``missing built-in hard-deny policies`` and every task fails at 0
# turns before the agent even connects to the CLI. Discovered during
# Chunk 10 E2E T2.2 — the Dockerfile previously only copied ``src/``.
COPY agent/policies/ /app/policies/
# Cross-language constants (S9). ``agent/src/policy.py`` reads
# ``/app/contracts/constants.json`` at import; the same file is consumed
# by ``cdk/src/handlers/shared/types.ts`` at synth time. See
# ``contracts/README.md`` for the contract.
COPY contracts/ /app/contracts/
COPY agent/prepare-commit-msg.sh /app/
COPY agent/test_sdk_smoke.py agent/test_subprocess_threading.py /app/

# Create non-root user (Claude Code CLI refuses bypassPermissions as root)
RUN useradd -m -s /bin/bash agent && \
Expand Down
6 changes: 5 additions & 1 deletion agent/mise.toml
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,12 @@ run = "uvx bandit[toml] -c pyproject.toml -r . --severity-level=high"

[tasks."security:image"]
description = "Scan container image with trivy"
# Build context is repo root (..) so Dockerfile can COPY contracts/
# alongside agent/ — matches cdk/src/stacks/agent.ts. Without -f and
# the .. context, the build fails because COPY agent/... can't find
# agent/ inside the agent/ directory.
run = [
"docker image inspect bgagent-local:latest >/dev/null 2>&1 || (ARCH=\"$(uname -m)\"; PLATFORM=\"linux/arm64\"; if [ \"$ARCH\" = \"x86_64\" ]; then PLATFORM=\"linux/amd64\"; fi; docker build --build-arg TARGETPLATFORM=\"$PLATFORM\" --build-arg CACHE_BUST=\"$(date +%s)\" -t bgagent-local:latest .)",
"docker image inspect bgagent-local:latest >/dev/null 2>&1 || (ARCH=\"$(uname -m)\"; PLATFORM=\"linux/arm64\"; if [ \"$ARCH\" = \"x86_64\" ]; then PLATFORM=\"linux/amd64\"; fi; docker build --build-arg TARGETPLATFORM=\"$PLATFORM\" --build-arg CACHE_BUST=\"$(date +%s)\" -f Dockerfile -t bgagent-local:latest ..)",
"trivy image --scanners vuln --ignore-unfixed --ignorefile .trivyignore --severity HIGH,CRITICAL --exit-code 1 bgagent-local:latest",
]

Expand Down
59 changes: 59 additions & 0 deletions agent/policies/hard_deny.cedar
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
// Built-in hard-deny policy set for Cedar HITL engine.
//
// Hard-deny is ABSOLUTE: no --pre-approve scope and no blueprint `disable:`
// directive can bypass these rules. See docs/design/CEDAR_HITL_GATES.md
// §12.5 and decision #8.
//
// Every rule in this file MUST carry @tier("hard") + @rule_id annotations.
// Adding a rule here expands the set of categorically-forbidden agent
// actions; removing a rule requires a security review.

// Base catch-all permit. Specific forbid rules below override.
@rule_id("base_permit")
permit (principal, action, resource);

// pr_review tasks may never invoke Write. Absolute; cannot be overridden
// by per-blueprint customization or --pre-approve.
@tier("hard")
@rule_id("pr_review_forbid_write")
forbid (
principal == Agent::TaskAgent::"pr_review",
action == Agent::Action::"invoke_tool",
resource == Agent::Tool::"Write"
);

// pr_review tasks may never invoke Edit.
@tier("hard")
@rule_id("pr_review_forbid_edit")
forbid (
principal == Agent::TaskAgent::"pr_review",
action == Agent::Action::"invoke_tool",
resource == Agent::Tool::"Edit"
);

// Reject `rm -rf /` and similar absolute-root destructive commands.
@tier("hard")
@rule_id("rm_slash")
forbid (principal, action == Agent::Action::"execute_bash", resource)
when { context.command like "*rm -rf /*" };

// Reject writes into `.git/` at the repo root (breaks local git state).
@tier("hard")
@rule_id("write_git_internals")
forbid (principal, action == Agent::Action::"write_file", resource)
when { context.file_path like ".git/*" };

// Reject writes into nested `.git/` directories (submodules, worktrees).
@tier("hard")
@rule_id("write_git_internals_nested")
forbid (principal, action == Agent::Action::"write_file", resource)
when { context.file_path like "*/.git/*" };

// Reject any SQL DROP TABLE through Bash — agents should not be running
// destructive DDL against production or dev databases without a human
// in the loop. Hard-deny because even "just testing locally" is a common
// vector for data loss (wrong DB connected via saved credentials).
@tier("hard")
@rule_id("drop_table")
forbid (principal, action == Agent::Action::"execute_bash", resource)
when { context.command like "*DROP TABLE*" };
84 changes: 84 additions & 0 deletions agent/policies/soft_deny.cedar
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
// Base catch-all permit. Without it, cedarpy's default-deny would turn
// every non-matching Cedar evaluation on this tier into a DENY decision,
// making the soft tier indistinguishable from hard-deny. With it, Cedar
// returns ALLOW (no matching forbid) and our engine's STEP 3 sees only
// the genuine forbid hits as REQUIRE_APPROVAL.
@rule_id("base_permit")
permit (principal, action, resource);

// Built-in soft-deny policy set for Cedar HITL engine.
//
// Soft-deny is the HUMAN-IN-THE-LOOP surface: matching rules pause the
// tool call, write an approval request to DynamoDB, and await a human
// response via `bgagent approve` / `bgagent deny`. See
// docs/design/CEDAR_HITL_GATES.md §§2, 6, 15.4.
//
// Every rule in this file MUST carry:
// @tier("soft")
// @rule_id("...") — stable ID for --pre-approve rule:X
// @approval_timeout_s — integer seconds >= 30 (<120 emits WARN per IMPL-25)
// @severity — "low" | "medium" | "high"
// @category — optional free-form UX grouping
//
// Blueprints may OPT OUT of specific rules here via
// `security.cedarPolicies.disable: [rule_id]`. They may NOT disable any
// rule in hard_deny.cedar (blueprint loader rejects those at task start).

// Gate any git --force / -f push. 300s default approval window, medium severity.
// Covers both long-form (--force) and short-form (-f) variants, including
// the bare `git push -f` invocation with no branch argument.
@tier("soft")
@rule_id("force_push_any")
@approval_timeout_s("300")
@severity("medium")
@category("destructive")
forbid (principal, action == Agent::Action::"execute_bash", resource)
when { context.command like "*git push --force*"
|| context.command like "*git push -f *"
|| context.command like "*git push -f" };

// Force-push to main/prod specifically — longer window, higher severity.
// Multi-match with force_push_any is expected: the engine's annotation
// merging picks min(300, 600)=300s and max(medium, high)=high.
@tier("soft")
@rule_id("force_push_main")
@approval_timeout_s("600")
@severity("high")
@category("destructive")
forbid (principal, action == Agent::Action::"execute_bash", resource)
when { context.command like "*git push --force origin main*"
|| context.command like "*git push --force origin prod*"
|| context.command like "*git push -f origin main*"
|| context.command like "*git push -f origin prod*" };

// Non-force pushes to protected branches — catches the case where an
// agent bypasses PR workflow by pushing directly.
@tier("soft")
@rule_id("push_to_protected_branch")
@approval_timeout_s("300")
@severity("medium")
@category("destructive")
forbid (principal, action == Agent::Action::"execute_bash", resource)
when { context.command like "*git push origin main*"
|| context.command like "*git push origin master*"
|| context.command like "*git push origin prod*"
|| context.command like "*git push origin release/*" };

// Writes to `.env` files typically contain secrets. 600s window, high severity.
@tier("soft")
@rule_id("write_env_files")
@approval_timeout_s("600")
@severity("high")
@category("filesystem")
forbid (principal, action == Agent::Action::"write_file", resource)
when { context.file_path like "*.env" };

// Writes to any path containing "credentials" — SSH keys, AWS creds,
// service-account JSON, etc. 300s window, high severity.
@tier("soft")
@rule_id("write_credentials")
@approval_timeout_s("300")
@severity("high")
@category("auth")
forbid (principal, action == Agent::Action::"write_file", resource)
when { context.file_path like "*credentials*" };
13 changes: 12 additions & 1 deletion agent/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,18 @@ dependencies = [
"uvicorn==0.47.0", #https://pypi.org/project/uvicorn/
"aws-opentelemetry-distro==0.17.0", #https://pypi.org/project/aws-opentelemetry-distro/
"mcp==1.27.1", #https://pypi.org/project/mcp/
"cedarpy==4.8.3", #https://github.com/k9securityio/cedar-py
# CEDAR ENGINE PARITY — DO NOT BUMP IN ISOLATION.
# cedarpy (Python, agent runtime) and @cedar-policy/cedar-wasm (TypeScript,
# CDK Lambdas) are two language bindings over the same Cedar Rust core.
# Even patch-version drift between the bindings can produce divergent
# (decision, matching_rule_ids) on the same (policy, input) — a class
# of bug invisible to per-side unit tests. The contracts/cedar-parity/
# golden fixtures are how CI catches divergence; if you bump cedarpy
# you MUST bump @cedar-policy/cedar-wasm to a tested-compatible version
# in cdk/package.json AND refresh the parity fixtures, in the same
# commit. See docs/design/CEDAR_HITL_GATES.md §15.6 (decision #23) and
# the parity-contract banner in mise.toml.
"cedarpy==4.8.0", #https://github.com/k9securityio/cedar-py — EXACT pin (no ^/~), parity with @cedar-policy/cedar-wasm@4.10.0
]

[tool.bandit]
Expand Down
8 changes: 8 additions & 0 deletions agent/src/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ def build_config(
channel_metadata: dict[str, str] | None = None,
trace: bool = False,
user_id: str = "",
approval_timeout_s: int | None = None,
initial_approvals: list[str] | None = None,
initial_approval_gate_count: int = 0,
approval_gate_cap: int | None = None,
) -> TaskConfig:
"""Build and validate configuration from explicit parameters.

Expand Down Expand Up @@ -146,6 +150,10 @@ def build_config(
channel_metadata=channel_metadata or {},
trace=trace,
user_id=user_id,
approval_timeout_s=approval_timeout_s,
initial_approvals=initial_approvals or [],
initial_approval_gate_count=initial_approval_gate_count,
approval_gate_cap=approval_gate_cap,
)


Expand Down
Loading