[audit-workflows] Daily Audit – April 8, 2026 #25388
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #25534. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Automated daily audit of agentic workflow runs over the past 24 hours (2026-04-08).
Summary
Workflow Health Trend
The activity was concentrated in a ~30-minute window between 20:31–21:03 UTC. A large burst of 39 cancellations occurred at ~20:31–20:32 UTC across smoke test and feature workflows (all with 1–9s durations), consistent with a concurrent event triggering multiple workflows that were then superseded or cancelled.
Token & Cost Trend
The token/cost chart confirms no billable usage was recorded in the run summaries for this period. This is likely a telemetry gap rather than zero actual usage.
📎 Charts and full artifacts: workflow run #24158618027
❌ Failure Analysis
1. Copilot CLI Startup Failures (2 runs)
Two separate workflows failed because the Copilot CLI process exited immediately with code 1 (< 1s duration, no stdout/stderr output, no retry triggered):
issuesscheduleThe Daily Workflow Updater also shows a behavioral regression: its last successful run (§24050857331) completed 8 turns with a write posture; this run produced 0 turns with read-only posture — a significant delta.
Likely cause: Copilot CLI v1.0.21 failing to initialize (token/credential or binary startup issue in the sandboxed container). The copilot-driver does not retry when
hasOutput=false.2. Git Checkout Failures — Merged PR Branch (2 runs)
Both
Design Decision Gate 🏗️andTest Quality Sentinelfailed to checkout the PR branchcopilot/rename-upload-safe-output-itemsfor PR #25378 (exit code 128). The branch was deleted after the PR was merged, causing the checkout step to fail.These two failures share a single root cause (one PR event), so they count as one incident.
3. API Rate Limit — AI Moderator (1 run)
§24157224934 failed during
pre_activationwhen attempting to lock issue #25381. The GitHub API returned a rate limit error for the installation token. This occurred at 20:32 UTC, immediately following the burst of 39 cancellations at 20:31 UTC — the mass trigger event likely depleted the installation rate limit.39 runs were cancelled at ~20:31–20:32 UTC with sub-10s durations. Affected workflows include smoke tests (
Smoke Copilot,Smoke Agentvariants,Smoke Gemini,Smoke Codex) and feature workflows (Changeset Generator,Code Refiner,Approach Validator,/cloclo). This is consistent with a commit push or PR merge onmaintriggering a wave of concurrent workflow runs that were then cancelled due to concurrency limits or superseding commits.Affected cancelled workflows (39 total)
/cloclo— 2 cancelledSmoke Agent: public/none— 3 cancelledSmoke Agent: all/merged— 2 cancelledSmoke Agent: public/approved— 2 cancelledSmoke Agent: scoped/approved— 2 cancelledSmoke Agent: all/none— 1 cancelledSmoke Call Workflow— 2 cancelledSmoke Claude— 2 cancelledSmoke Codex— 1 cancelledSmoke Copilot ARM64— 1 cancelledSmoke Create Cross-Repo PR— 2 cancelledSmoke Gemini— 2 cancelledSmoke Multi PR— 2 cancelledSmoke Project— 2 cancelledSmoke Temporary ID— 2 cancelledSmoke Update Cross-Repo PR— 2 cancelledAgent Container Smoke Test— 2 cancelledApproach Validator— 2 cancelledChangeset Generator— 2 cancelledCode Refiner— 2 cancelledDesign Decision Gate 🏗️— 1 cancelledRecommendations
Design Decision GateandTest Quality Sentinel— gate on PR state before attempting checkout.pre_activationissue-locking step inAI Moderatorto handle transient rate limit spikes from concurrent workflow bursts.TokenUsageis 0 in all run summaries. May need to ensure OTEL data is written before artifact upload.References:
Beta Was this translation helpful? Give feedback.
All reactions