Add flake_cluster: cluster tests that flake together (co-failure Jaccard) by JE-Chen · Pull Request #412 · Integration-Automation/AutoControlGUI

JE-Chen · 2026-06-24T17:50:09Z

Why

Flaky tests are rarely independent: a wobbly shared fixture, a slow dependency or a noisy environment makes a group of tests fail in the same runs (research finds ~75% of flaky tests fall into co-failure clusters). Ranking tests one-by-one by flip rate misses that shared root cause. flake_cluster measures how often each pair of tests fails in the same runs — Jaccard similarity over the set of runs each failed in — and groups tests above a threshold, so you chase one root cause instead of N symptoms.

cofailure_pairs — test pairs that fail together above a Jaccard threshold
failure_clusters — connected clusters of co-failing tests with a cohesion score (mean pairwise Jaccard)

Third item of the test-robustness lane. Input is a list of runs, each the test names that failed in it.

Design

Pure stdlib: per-test failing-run sets → pairwise set-Jaccard → threshold graph → connected components (single-linkage). The existing text_similarity.jaccard is n-gram string similarity (different), so set-Jaccard stays internal. Every function under CC 10 (radon-clean).
5 layers wired: core → facade __all__ → AC_failure_clusters / AC_cofailure_pairs → read-only ac_* MCP tools → Script Builder (Testing). Qt-free verified; pytest.approx for cohesion (no float ==).

Tests

test/unit_test/headless/test_flake_cluster_batch.py — clusters group co-failing tests (cohesion 1.0), singleton excluded by min_size, min_size=1 keeps singletons, high threshold keeps only perfect co-failure, cofailure_pairs scores + sort + count, empty / no-co-failure cases, the executor paths + 5-layer wiring. 9 passed.

…ard) Flaky tests are rarely independent - a wobbly fixture or noisy dependency makes a group fail in the same runs (~75% of flaky tests cluster). Ranking tests one-by-one by flip rate misses that shared root cause. Measure how often each pair fails in the same runs (Jaccard over their failing-run sets) and group tests above a threshold into connected clusters with a cohesion score. Pure stdlib over a list of failed-test sets.

codacy-production · 2026-06-24T17:52:22Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 43 complexity · 0 duplication

Metric Results

Complexity 43

Duplication 0

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

sonarqubecloud · 2026-06-24T17:57:22Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

JE-Chen merged commit 59767cf into dev Jun 24, 2026
16 checks passed

JE-Chen deleted the feat/flake-cluster-batch branch June 24, 2026 17:55

JE-Chen mentioned this pull request Jun 24, 2026

Release: test-robustness lane — failure signatures, run diff, flake clustering, step timeline (v191–v194) #414

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add flake_cluster: cluster tests that flake together (co-failure Jaccard)#412

Add flake_cluster: cluster tests that flake together (co-failure Jaccard)#412
JE-Chen merged 1 commit into
devfrom
feat/flake-cluster-batch

JE-Chen commented Jun 24, 2026

Uh oh!

codacy-production Bot commented Jun 24, 2026

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JE-Chen commented Jun 24, 2026

Why

Design

Tests

Uh oh!

codacy-production Bot commented Jun 24, 2026

Up to standards ✅

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 24, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant