Skip to content

ci: add weekly CI health check with Slack notification#3191

Merged
QuantumExplorer merged 11 commits intov3.1-devfrom
ci/weekly-health-check-slack
Mar 5, 2026
Merged

ci: add weekly CI health check with Slack notification#3191
QuantumExplorer merged 11 commits intov3.1-devfrom
ci/weekly-health-check-slack

Conversation

@lklimek
Copy link
Contributor

@lklimek lklimek commented Mar 5, 2026

Issue being fixed or feature implemented

Nightly CI jobs run on the main branch but failures go unnoticed. This adds a weekly Monday morning check that alerts the team when things are red.

User Story

Imagine you are a developer on the platform team. Every Monday at 8 UTC you get a Slack message in #platform-team if any CI workflow on v3.1-dev is failing — so you can fix it before it blocks the whole week.

What was done?

Added .github/workflows/weekly-ci-health.yml:

  • Runs every Monday at 8:00 UTC via cron schedule
  • Also supports workflow_dispatch for manual testing
  • Queries GitHub API for all active workflows, checks last run on v3.1-dev
  • Collects any with conclusion: failure
  • Posts a Slack message via incoming webhook with clickable links to failed runs
  • Silent when everything is green — no noise

Setup required

  1. Create a Slack Incoming Webhook for #platform-team
  2. Add repo secret SLACK_CI_WEBHOOK_URL with the webhook URL

How Has This Been Tested?

  • YAML validated with Python yaml.safe_load()
  • Can be tested manually via gh workflow run "Weekly CI Health Check"

Breaking Changes

None

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation if needed

🤖 Co-authored by Claudius the Magnificent AI Agent

Runs every Monday at 8 UTC. Checks last run of all active workflows
on v3.1-dev and posts to #platform-team Slack channel if any are red.
Includes workflow_dispatch trigger for manual testing.

Requires SLACK_CI_WEBHOOK_URL repo secret.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added this to the v3.1.0 milestone Mar 5, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Warning

Rate limit exceeded

@lklimek has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 18 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f1e09c2c-eac5-4a8e-be76-682759bf277e

📥 Commits

Reviewing files that changed from the base of the PR and between 7cac120 and 4251744.

📒 Files selected for processing (1)
  • .github/workflows/weekly-ci-health.yml
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ci/weekly-health-check-slack

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

lklimek and others added 8 commits March 5, 2026 14:08
Replace gh api calls with gh workflow list / gh run list subcommands.
Auto-detect default branch via gh repo view instead of hardcoding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add top-level permissions block: actions:read + contents:read.
This workflow only queries workflow runs — no write access needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gh CLI needs a .git directory to infer the repository.
Uses sparse-checkout to avoid downloading any actual code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Workflow-level conclusion can be 'failure' even when all jobs
succeeded or were skipped. Now drills into individual jobs and
only reports workflows with actual job failures. Slack message
includes failed job names for context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use $'\n' for real newlines in bash variables so jq --arg
properly encodes them as \n in the JSON payload. Fixes literal
\n showing in Slack and broken mrkdwn bold markers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek lklimek marked this pull request as ready for review March 5, 2026 13:53
@lklimek lklimek requested a review from QuantumExplorer as a code owner March 5, 2026 13:53
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek lklimek had a problem deploying to test-suite-approval March 5, 2026 13:54 — with GitHub Actions Error
@lklimek lklimek had a problem deploying to test-suite-approval March 5, 2026 13:54 — with GitHub Actions Error
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek lklimek requested a deployment to test-suite-approval March 5, 2026 13:57 — with GitHub Actions Waiting
@lklimek lklimek requested a deployment to test-suite-approval March 5, 2026 13:57 — with GitHub Actions Waiting
@QuantumExplorer QuantumExplorer merged commit a7b0661 into v3.1-dev Mar 5, 2026
20 of 22 checks passed
@QuantumExplorer QuantumExplorer deleted the ci/weekly-health-check-slack branch March 5, 2026 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants