Skip to content

Add flake_cluster: cluster tests that flake together (co-failure Jaccard)#412

Merged
JE-Chen merged 1 commit into
devfrom
feat/flake-cluster-batch
Jun 24, 2026
Merged

Add flake_cluster: cluster tests that flake together (co-failure Jaccard)#412
JE-Chen merged 1 commit into
devfrom
feat/flake-cluster-batch

Conversation

@JE-Chen

@JE-Chen JE-Chen commented Jun 24, 2026

Copy link
Copy Markdown
Member

Why

Flaky tests are rarely independent: a wobbly shared fixture, a slow dependency or a noisy environment makes a group of tests fail in the same runs (research finds ~75% of flaky tests fall into co-failure clusters). Ranking tests one-by-one by flip rate misses that shared root cause. flake_cluster measures how often each pair of tests fails in the same runs — Jaccard similarity over the set of runs each failed in — and groups tests above a threshold, so you chase one root cause instead of N symptoms.

  • cofailure_pairs — test pairs that fail together above a Jaccard threshold
  • failure_clusters — connected clusters of co-failing tests with a cohesion score (mean pairwise Jaccard)

Third item of the test-robustness lane. Input is a list of runs, each the test names that failed in it.

Design

  • Pure stdlib: per-test failing-run sets → pairwise set-Jaccard → threshold graph → connected components (single-linkage). The existing text_similarity.jaccard is n-gram string similarity (different), so set-Jaccard stays internal. Every function under CC 10 (radon-clean).
  • 5 layers wired: core → facade __all__AC_failure_clusters / AC_cofailure_pairs → read-only ac_* MCP tools → Script Builder (Testing). Qt-free verified; pytest.approx for cohesion (no float ==).

Tests

test/unit_test/headless/test_flake_cluster_batch.py — clusters group co-failing tests (cohesion 1.0), singleton excluded by min_size, min_size=1 keeps singletons, high threshold keeps only perfect co-failure, cofailure_pairs scores + sort + count, empty / no-co-failure cases, the executor paths + 5-layer wiring. 9 passed.

…ard)

Flaky tests are rarely independent - a wobbly fixture or noisy
dependency makes a group fail in the same runs (~75% of flaky tests
cluster). Ranking tests one-by-one by flip rate misses that shared root
cause. Measure how often each pair fails in the same runs (Jaccard over
their failing-run sets) and group tests above a threshold into connected
clusters with a cohesion score. Pure stdlib over a list of failed-test
sets.
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 43 complexity · 0 duplication

Metric Results
Complexity 43
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@JE-Chen JE-Chen merged commit 59767cf into dev Jun 24, 2026
16 checks passed
@JE-Chen JE-Chen deleted the feat/flake-cluster-batch branch June 24, 2026 17:55
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant