You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-up from PR #88 — long-term observability cleanup; not blocking but worth tracking.
Functional description
ABCA's CloudWatch dashboard has two distinct kinds of widgets:
EMF metric widgets (added in PR feat: Cedar HITL approval gates for agent tool use #88, Chunk 8b): driven by ApprovalMetricsPublisher extracting structured metrics from TaskEventsTable DDB streams. Refresh every ~1 minute. Cheap. Show approval latency by outcome, etc.
LogQueryWidget widgets (older): run CloudWatch Logs Insights queries against the TaskEventsTable stream → CloudWatch log group on every dashboard load. Slower (each query takes seconds). More expensive (CWLI charges per GB scanned). Limited by CWLI sample-window semantics.
The PR #88 widgets prove the EMF pattern works for ABCA's use case. The remaining LogQueryWidget widgets are now technical debt — they're the older pattern, and once a third widget needs a metric the old query produces, the migration becomes worth doing.
User-visible impact (today):
Dashboard takes longer to load than necessary.
Some metrics are stale by the CWLI sample window (often 5+ minutes).
AWS bill has avoidable CWLI scan costs.
User-visible impact (in 6 months if not done):
Adding new dashboard widgets gets harder because every new metric needs its own producer (publisher → EMF) AND a parallel LogQueryWidget for backfill / historical comparison.
Two patterns to maintain instead of one.
Technical context
Which widgets use which pattern: needs an audit. Likely:
cdk/src/constructs/task-dashboard.ts (or similar) — defines the dashboard. Search for LogQueryWidget to find the legacy ones.
The EMF-side definitions are likely in the same file or an adjacent cedar-hitl-dashboard.ts.
Migration shape per widget:
Identify what the legacy LogQueryWidget measures (e.g. "task throughput by outcome over time").
Add the matching EMF metric to ApprovalMetricsPublisher (or extend if conceptually similar).
Add the EMF widget to the dashboard.
Remove the legacy LogQueryWidget.
Why this is P3:
Today's LogQueryWidgets work, just slower.
The migration touches the publisher Lambda — a load-bearing component. Risk of regression.
No customer is asking for this; it's hygiene.
When to escalate to P2:
If the dashboard sees a third active maintainer (the cost of two patterns shows up in PR review friction).
If the AWS bill grows enough that the CWLI scan costs matter.
If a new widget needs both EMF (live) and historical backfill — that's the natural moment to migrate.
Proposed approach
Phase 1 — audit + plan (this issue):
List every LogQueryWidget in the dashboard config.
For each, note what metric it measures and whether the publisher already emits an equivalent EMF metric.
Estimate effort per widget (most should be < 1 hour).
Phase 2 — migrate one widget as proof:
Pick the lowest-blast-radius widget.
Land the migration as a small PR.
Validate: the new widget shows the same data as the old one (run both side by side for a week before deleting the old).
Phase 3 — finish:
Migrate remaining widgets in batches.
Recommend doing phase 1 only as part of this issue; phases 2 + 3 become their own issues once the audit reveals scope.
Acceptance criteria
An audit document (or comment on this issue) lists every LogQueryWidget in the dashboard, what it measures, and whether an EMF equivalent exists
For each LegacyWidget without an EMF equivalent, the audit notes whether one is feasible (most should be)
Estimated effort per widget is captured so the next person can prioritize
Out of scope
Actually migrating widgets (those become per-widget follow-up issues / PRs).
Adding new metrics that don't have a legacy LogQueryWidget counterpart.
Cross-account dashboard syndication.
References
cdk/src/constructs/task-dashboard.ts (or wherever the dashboard lives — needs locating)
Functional description
ABCA's CloudWatch dashboard has two distinct kinds of widgets:
ApprovalMetricsPublisherextracting structured metrics fromTaskEventsTableDDB streams. Refresh every ~1 minute. Cheap. Show approval latency by outcome, etc.LogQueryWidgetwidgets (older): run CloudWatch Logs Insights queries against theTaskEventsTablestream → CloudWatch log group on every dashboard load. Slower (each query takes seconds). More expensive (CWLI charges per GB scanned). Limited by CWLI sample-window semantics.The PR #88 widgets prove the EMF pattern works for ABCA's use case. The remaining
LogQueryWidgetwidgets are now technical debt — they're the older pattern, and once a third widget needs a metric the old query produces, the migration becomes worth doing.User-visible impact (today):
User-visible impact (in 6 months if not done):
LogQueryWidgetfor backfill / historical comparison.Technical context
Which widgets use which pattern: needs an audit. Likely:
cdk/src/constructs/task-dashboard.ts(or similar) — defines the dashboard. Search forLogQueryWidgetto find the legacy ones.cedar-hitl-dashboard.ts.Migration shape per widget:
LogQueryWidgetmeasures (e.g. "task throughput by outcome over time").ApprovalMetricsPublisher(or extend if conceptually similar).LogQueryWidget.Why this is P3:
LogQueryWidgets work, just slower.When to escalate to P2:
Proposed approach
Phase 1 — audit + plan (this issue):
LogQueryWidgetin the dashboard config.Phase 2 — migrate one widget as proof:
Phase 3 — finish:
Recommend doing phase 1 only as part of this issue; phases 2 + 3 become their own issues once the audit reveals scope.
Acceptance criteria
LogQueryWidgetin the dashboard, what it measures, and whether an EMF equivalent existsOut of scope
LogQueryWidgetcounterpart.References
cdk/src/constructs/task-dashboard.ts(or wherever the dashboard lives — needs locating)cdk/src/constructs/cedar-hitl-dashboard.ts(PR feat: Cedar HITL approval gates for agent tool use #88 EMF dashboard, the migration target shape)docs/design/CEDAR_HITL_GATES.md§10 (dashboard design)