Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,9 @@
"flaky-tests/detection/pass-on-retry-monitor",
"flaky-tests/detection/failure-rate-monitor",
"flaky-tests/detection/failure-count-monitor",
"flaky-tests/detection/skipped-test-monitor",
"flaky-tests/detection/slow-test-monitor",
"flaky-tests/detection/new-test-monitor",
"flaky-tests/detection/flag-as-flaky",
"flaky-tests/detection/the-importance-of-pr-test-results",
"flaky-tests/detection/infrastructure-failure-protection"
Expand Down
3 changes: 3 additions & 0 deletions flaky-tests/detection/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@
| [**Pass-on-Retry**](./pass-on-retry-monitor) | A test fails then passes on the same commit (retry after failure) | Classify (flaky) or [apply labels](../management/test-labels#automatic-labeling-from-monitors) | Team and above | Enabled |
| [**Failure Rate**](./failure-rate-monitor) | Failure rate exceeds a configured percentage over a time window | Classify (flaky or broken) or [apply labels](../management/test-labels#automatic-labeling-from-monitors) | Paid plans | Disabled |
| [**Failure Count**](./failure-count-monitor) | A test accumulates a configured number of failures in a rolling window | Classify (flaky or broken) or [apply labels](../management/test-labels#automatic-labeling-from-monitors) | Paid plans | Disabled |
| [**Skipped Test**](./skipped-test-monitor) | A test is consistently skipped across runs within a time window | Apply labels | Paid plans | Disabled |
| [**Slow Test**](./slow-test-monitor) | A test's average duration exceeds a configured threshold | Apply labels | Paid plans | Disabled |
| [**New Test**](./new-test-monitor) | A test case seen for the first time, tracked for a configurable grace period | Apply labels | Paid plans | Disabled |

You can run multiple monitors simultaneously. For example, you might use pass-on-retry to catch classic retry-based flakiness while also running failure rate monitors scoped to different branches. A common pattern is to pair a broken-type failure rate monitor (catching consistently failing tests) with a flaky-type failure rate monitor (catching intermittently failing tests). See [Failure Rate Monitor: Recommended Configurations](./failure-rate-monitor#recommended-configurations) for details.

Expand Down Expand Up @@ -81,7 +84,7 @@

While muted, the monitor is excluded from the test's status calculation. If the muted monitor was the only active classifying monitor, the test transitions from flaky to healthy for the duration of the mute. When the mute expires, the monitor is automatically included in the next status evaluation. If it's still active, the test will be flagged again.

You can also unmute a monitor early from the test case view.

Check warning on line 87 in flaky-tests/detection/index.mdx

View check run for this annotation

Mintlify / Mintlify Validation (trunk-4cab4936) - vale-spellcheck

flaky-tests/detection/index.mdx#L87

Did you really mean 'unmute'?

{/* SCREENSHOT: Mute button and duration picker on the test case monitor list.
Show the test case detail page with a monitor's mute button visible,
Expand Down
49 changes: 49 additions & 0 deletions flaky-tests/detection/new-test-monitor.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
title: "New Test Monitor"
description: "Track recently added tests and apply labels until they have an established history."
---

The new test monitor identifies test cases the first time they are seen in your test uploads and keeps them labeled for a configurable number of days. It is designed for visibility, not classification: the monitor does not mark tests as flaky or broken. Instead, it applies the labels you configure so your team can distinguish brand-new tests from established ones during triage.

## When to Use This Monitor

- **New test tracking:** Apply a `new-test` label automatically so reviewers know a failing test may simply lack history.
- **Coverage audits:** Identify which tests were added in the last sprint without manually diffing test suites.
- **Noise reduction during ramp-up:** Suppress new tests from triggering alerts by combining this monitor's label with quarantine or alert filter rules.

## How It Works

When a test upload contains a test case ID that has never appeared before, the monitor records its first-seen timestamp. For the next `newDays` days, that test is considered "active" by this monitor and the configured labels are applied. After `newDays` days have passed since the first observation, the monitor resolves the test and the labels are removed.

The monitor runs every five minutes. Detection lookback is capped at six hours per run to keep each pass bounded, so a test seen for the first time will be labeled within at most twenty minutes of its upload being processed.

Check warning on line 18 in flaky-tests/detection/new-test-monitor.mdx

View check run for this annotation

Mintlify / Mintlify Validation (trunk-4cab4936) - vale-spellcheck

flaky-tests/detection/new-test-monitor.mdx#L18

Did you really mean 'lookback'?

## Configuration

| Setting | Description | Default |
|---|---|---|
| New days | Number of days after first observation before the monitor resolves and labels are removed | Required |
| Action | Apply labels (the only available action — this monitor does not classify) | Apply labels |

### New Days

Set `newDays` to how long you want the "new" label to stay on a test. A value of 7 means any test added in the last week carries the label. A value of 30 gives a full month of ramp-up coverage before the label drops.

### Action

The new test monitor is a performance-type monitor. It applies labels only and does not change a test's health status (flaky or broken). Choose which labels to apply in the monitor configuration. When the monitor resolves, those labels are removed according to the monitor's label removal setting.

## Resolution

The monitor resolves a test automatically once `newDays` days have elapsed since `first_seen_at`. There is no manual resolution step — once the window passes, the label is removed on the next detection cycle.

If a test is deleted and re-uploaded with the same test case ID, the original `first_seen_at` timestamp is used. The monitor does not reset the clock for re-appearing tests.

## Choosing Between Monitors

| Goal | Recommended monitor |
|---|---|
| Label brand-new tests for a grace period | New test monitor |
| Detect tests that consistently skip runs | Skipped test monitor |
| Flag tests whose runtime exceeds a threshold | Slow test monitor |
| Detect tests that fail then pass on retry | Pass-on-retry monitor |
| Alert on tests failing at a sustained rate | Failure rate monitor |
54 changes: 54 additions & 0 deletions flaky-tests/detection/skipped-test-monitor.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: "Skipped Test Monitor"
description: "Detect tests that are consistently being skipped and apply labels to surface them for review."
---

The skipped test monitor tracks test cases that accumulate a configured number of skipped runs within a time window. It applies labels to those tests so your team can identify tests that are being silently ignored, rather than classifying them as flaky or broken.

## When to Use This Monitor

- **Surface suppressed tests:** Find tests that someone marked as skip (`.skip`, `xtest`, `xit`) and never re-enabled.
- **Track intentional skips:** Apply a `skipped` label so dashboards reflect tests that are excluded from runs, giving a more accurate picture of suite coverage.
- **Scope to specific branches:** Detect skips on main or release branches where a skipped test represents a gap in coverage rather than a development convenience.

## How It Works

The monitor counts the number of skipped runs for each test case within a configurable time window (in minutes). When a test accumulates at least `minSkippedCount` skipped runs in that window, the monitor activates and applies the configured labels.

Resolution occurs after `resolutionDays` days pass with no new skipped runs recorded for that test on any monitored branch.

## Configuration

| Setting | Description | Default |
|---|---|---|
| Window | Time window (minutes) over which skipped runs are counted | Required |
| Min skipped count | Number of skipped runs in the window required to activate | Required |
| Resolution days | Days without a new skipped run before the monitor resolves | Required |
| Branch scope | Branch names or glob patterns to monitor | All branches |
| Action | Apply labels (the only available action — this monitor does not classify) | Apply labels |

### Window

The time window controls how far back the monitor looks when counting skipped runs. A shorter window (e.g., 60 minutes) catches tests skipped in a burst around a specific CI run. A longer window (e.g., 2 days, 2880 minutes) catches tests that are habitually skipped across many runs.

### Min Skipped Count

Set this to 1 to flag any test the moment it skips a single run in the window. Set it higher to require repeated skips, filtering out tests that are skipped once for a legitimate reason (such as a flaky environment that resolves itself).

### Resolution Days

After a test stops being skipped, the monitor waits `resolutionDays` before resolving. This prevents the label from flickering on and off for tests that skip intermittently.

### Branch Scope

Use branch patterns to limit detection to branches where a skipped test is significant. For example, monitoring only `main` means tests skipped on feature branches do not trigger the monitor.

## Choosing Between Monitors

| Goal | Recommended monitor |
|---|---|
| Detect tests consistently being skipped | Skipped test monitor |
| Track recently added tests | New test monitor |
| Flag tests whose runtime exceeds a threshold | Slow test monitor |
| Detect tests that fail then pass on retry | Pass-on-retry monitor |
| Alert on tests failing at a sustained rate | Failure rate monitor |
55 changes: 55 additions & 0 deletions flaky-tests/detection/slow-test-monitor.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "Slow Test Monitor"
description: "Flag tests whose average runtime exceeds a configured duration threshold."
---

The slow test monitor detects test cases whose measured duration exceeds a threshold you set, evaluated over a configurable time window and sample size. It applies labels to slow tests so your team can identify and prioritize performance improvements without classifying tests as flaky or broken.

## When to Use This Monitor

- **Identify tests slowing down CI:** Surface the specific tests adding the most wall time to your pipeline.
- **Enforce duration budgets:** Label any test that exceeds an acceptable runtime so it gets reviewed before merging.
- **Track regressions:** Catch tests that were fast but became slow after a code change.

## How It Works

The monitor evaluates average test duration across runs in a rolling time window. When a test's average duration exceeds the configured threshold and enough sample runs have been collected, the monitor activates and applies the configured labels.

Resolution happens when the test's measured duration drops back below the threshold over subsequent runs. If `staleAfterMinutes` is set, the monitor also resolves any active test that has had no recent runs on monitored branches — this prevents labels from persisting on tests that have been removed from the suite.

## Configuration

| Setting | Description | Default |
|---|---|---|
| Duration threshold | Minimum average test duration (milliseconds) to trigger detection | Required |
| Window | Time window (minutes) over which duration is measured | Required |
| Sample size | Minimum number of runs required before the monitor can activate | Required |
| Stale after | Minutes without any run on monitored branches before an active test resolves (optional) | Disabled |
| Branch scope | Branch names or glob patterns to monitor | All branches |
| Action | Apply labels (the only available action — this monitor does not classify) | Apply labels |

### Duration Threshold

Set the threshold in milliseconds. A value of 5000 flags any test averaging more than 5 seconds. Tune this based on your acceptable CI budget — tighter thresholds surface more tests but may require more review bandwidth.

### Window and Sample Size

The window controls how far back duration samples are collected. Sample size sets the minimum number of runs needed before the monitor will activate. This prevents a single slow run from triggering the monitor on a test with no history. For example, a window of 1440 minutes (one day) and a sample size of 5 means the monitor averages the last day's runs and requires at least five before drawing a conclusion.

### Stale After

When set, any test that has been active (labeled slow) but stops running on monitored branches for `staleAfterMinutes` minutes will be automatically resolved. Use this to clean up labels after a slow test is removed from the suite or renamed.

### Branch Scope

Scope the monitor to branches where test duration matters most, such as `main` or merge queue branches. Tests running on feature branches may have intentionally limited execution or variable infrastructure and may not represent a genuine slowness concern.

## Choosing Between Monitors

| Goal | Recommended monitor |
|---|---|
| Flag tests that are taking too long | Slow test monitor |
| Track recently added tests | New test monitor |
| Detect tests consistently being skipped | Skipped test monitor |
| Detect tests that fail then pass on retry | Pass-on-retry monitor |
| Alert on tests failing at a sustained rate | Failure rate monitor |