Skip to content

Migrate dashboard LogQueryWidget widgets to EMF metrics #115

@scoropeza

Description

@scoropeza

Follow-up from PR #88 — long-term observability cleanup; not blocking but worth tracking.

Functional description

ABCA's CloudWatch dashboard has two distinct kinds of widgets:

  • EMF metric widgets (added in PR feat: Cedar HITL approval gates for agent tool use #88, Chunk 8b): driven by ApprovalMetricsPublisher extracting structured metrics from TaskEventsTable DDB streams. Refresh every ~1 minute. Cheap. Show approval latency by outcome, etc.
  • LogQueryWidget widgets (older): run CloudWatch Logs Insights queries against the TaskEventsTable stream → CloudWatch log group on every dashboard load. Slower (each query takes seconds). More expensive (CWLI charges per GB scanned). Limited by CWLI sample-window semantics.

The PR #88 widgets prove the EMF pattern works for ABCA's use case. The remaining LogQueryWidget widgets are now technical debt — they're the older pattern, and once a third widget needs a metric the old query produces, the migration becomes worth doing.

User-visible impact (today):

  • Dashboard takes longer to load than necessary.
  • Some metrics are stale by the CWLI sample window (often 5+ minutes).
  • AWS bill has avoidable CWLI scan costs.

User-visible impact (in 6 months if not done):

  • Adding new dashboard widgets gets harder because every new metric needs its own producer (publisher → EMF) AND a parallel LogQueryWidget for backfill / historical comparison.
  • Two patterns to maintain instead of one.

Technical context

Which widgets use which pattern: needs an audit. Likely:

  • cdk/src/constructs/task-dashboard.ts (or similar) — defines the dashboard. Search for LogQueryWidget to find the legacy ones.
  • The EMF-side definitions are likely in the same file or an adjacent cedar-hitl-dashboard.ts.

Migration shape per widget:

  1. Identify what the legacy LogQueryWidget measures (e.g. "task throughput by outcome over time").
  2. Add the matching EMF metric to ApprovalMetricsPublisher (or extend if conceptually similar).
  3. Add the EMF widget to the dashboard.
  4. Remove the legacy LogQueryWidget.

Why this is P3:

  • Today's LogQueryWidgets work, just slower.
  • The migration touches the publisher Lambda — a load-bearing component. Risk of regression.
  • No customer is asking for this; it's hygiene.

When to escalate to P2:

  • If the dashboard sees a third active maintainer (the cost of two patterns shows up in PR review friction).
  • If the AWS bill grows enough that the CWLI scan costs matter.
  • If a new widget needs both EMF (live) and historical backfill — that's the natural moment to migrate.

Proposed approach

Phase 1 — audit + plan (this issue):

  • List every LogQueryWidget in the dashboard config.
  • For each, note what metric it measures and whether the publisher already emits an equivalent EMF metric.
  • Estimate effort per widget (most should be < 1 hour).

Phase 2 — migrate one widget as proof:

  • Pick the lowest-blast-radius widget.
  • Land the migration as a small PR.
  • Validate: the new widget shows the same data as the old one (run both side by side for a week before deleting the old).

Phase 3 — finish:

  • Migrate remaining widgets in batches.

Recommend doing phase 1 only as part of this issue; phases 2 + 3 become their own issues once the audit reveals scope.

Acceptance criteria

  • An audit document (or comment on this issue) lists every LogQueryWidget in the dashboard, what it measures, and whether an EMF equivalent exists
  • For each LegacyWidget without an EMF equivalent, the audit notes whether one is feasible (most should be)
  • Estimated effort per widget is captured so the next person can prioritize

Out of scope

  • Actually migrating widgets (those become per-widget follow-up issues / PRs).
  • Adding new metrics that don't have a legacy LogQueryWidget counterpart.
  • Cross-account dashboard syndication.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions