diff --git a/docs.json b/docs.json index f5aa936..d72a1b2 100644 --- a/docs.json +++ b/docs.json @@ -248,6 +248,9 @@ "flaky-tests/detection/pass-on-retry-monitor", "flaky-tests/detection/failure-rate-monitor", "flaky-tests/detection/failure-count-monitor", + "flaky-tests/detection/skipped-test-monitor", + "flaky-tests/detection/slow-test-monitor", + "flaky-tests/detection/new-test-monitor", "flaky-tests/detection/flag-as-flaky", "flaky-tests/detection/the-importance-of-pr-test-results", "flaky-tests/detection/infrastructure-failure-protection" diff --git a/flaky-tests/detection/index.mdx b/flaky-tests/detection/index.mdx index 9bca038..24c8aa3 100644 --- a/flaky-tests/detection/index.mdx +++ b/flaky-tests/detection/index.mdx @@ -33,6 +33,9 @@ For example, if you have a broken failure rate monitor and a flaky pass-on-retry | [**Pass-on-Retry**](./pass-on-retry-monitor) | A test fails then passes on the same commit (retry after failure) | Classify (flaky) or [apply labels](../management/test-labels#automatic-labeling-from-monitors) | Team and above | Enabled | | [**Failure Rate**](./failure-rate-monitor) | Failure rate exceeds a configured percentage over a time window | Classify (flaky or broken) or [apply labels](../management/test-labels#automatic-labeling-from-monitors) | Paid plans | Disabled | | [**Failure Count**](./failure-count-monitor) | A test accumulates a configured number of failures in a rolling window | Classify (flaky or broken) or [apply labels](../management/test-labels#automatic-labeling-from-monitors) | Paid plans | Disabled | +| [**Skipped Test**](./skipped-test-monitor) | A test is consistently skipped across runs within a time window | Apply labels | Paid plans | Disabled | +| [**Slow Test**](./slow-test-monitor) | A test's average duration exceeds a configured threshold | Apply labels | Paid plans | Disabled | +| [**New Test**](./new-test-monitor) | A test case seen for the first time, tracked for a configurable grace period | Apply labels | Paid plans | Disabled | You can run multiple monitors simultaneously. For example, you might use pass-on-retry to catch classic retry-based flakiness while also running failure rate monitors scoped to different branches. A common pattern is to pair a broken-type failure rate monitor (catching consistently failing tests) with a flaky-type failure rate monitor (catching intermittently failing tests). See [Failure Rate Monitor: Recommended Configurations](./failure-rate-monitor#recommended-configurations) for details. diff --git a/flaky-tests/detection/new-test-monitor.mdx b/flaky-tests/detection/new-test-monitor.mdx new file mode 100644 index 0000000..88600d5 --- /dev/null +++ b/flaky-tests/detection/new-test-monitor.mdx @@ -0,0 +1,49 @@ +--- +title: "New Test Monitor" +description: "Track recently added tests and apply labels until they have an established history." +--- + +The new test monitor identifies test cases the first time they are seen in your test uploads and keeps them labeled for a configurable number of days. It is designed for visibility, not classification: the monitor does not mark tests as flaky or broken. Instead, it applies the labels you configure so your team can distinguish brand-new tests from established ones during triage. + +## When to Use This Monitor + +- **New test tracking:** Apply a `new-test` label automatically so reviewers know a failing test may simply lack history. +- **Coverage audits:** Identify which tests were added in the last sprint without manually diffing test suites. +- **Noise reduction during ramp-up:** Suppress new tests from triggering alerts by combining this monitor's label with quarantine or alert filter rules. + +## How It Works + +When a test upload contains a test case ID that has never appeared before, the monitor records its first-seen timestamp. For the next `newDays` days, that test is considered "active" by this monitor and the configured labels are applied. After `newDays` days have passed since the first observation, the monitor resolves the test and the labels are removed. + +The monitor runs every five minutes. Detection lookback is capped at six hours per run to keep each pass bounded, so a test seen for the first time will be labeled within at most twenty minutes of its upload being processed. + +## Configuration + +| Setting | Description | Default | +|---|---|---| +| New days | Number of days after first observation before the monitor resolves and labels are removed | Required | +| Action | Apply labels (the only available action — this monitor does not classify) | Apply labels | + +### New Days + +Set `newDays` to how long you want the "new" label to stay on a test. A value of 7 means any test added in the last week carries the label. A value of 30 gives a full month of ramp-up coverage before the label drops. + +### Action + +The new test monitor is a performance-type monitor. It applies labels only and does not change a test's health status (flaky or broken). Choose which labels to apply in the monitor configuration. When the monitor resolves, those labels are removed according to the monitor's label removal setting. + +## Resolution + +The monitor resolves a test automatically once `newDays` days have elapsed since `first_seen_at`. There is no manual resolution step — once the window passes, the label is removed on the next detection cycle. + +If a test is deleted and re-uploaded with the same test case ID, the original `first_seen_at` timestamp is used. The monitor does not reset the clock for re-appearing tests. + +## Choosing Between Monitors + +| Goal | Recommended monitor | +|---|---| +| Label brand-new tests for a grace period | New test monitor | +| Detect tests that consistently skip runs | Skipped test monitor | +| Flag tests whose runtime exceeds a threshold | Slow test monitor | +| Detect tests that fail then pass on retry | Pass-on-retry monitor | +| Alert on tests failing at a sustained rate | Failure rate monitor | diff --git a/flaky-tests/detection/skipped-test-monitor.mdx b/flaky-tests/detection/skipped-test-monitor.mdx new file mode 100644 index 0000000..331147a --- /dev/null +++ b/flaky-tests/detection/skipped-test-monitor.mdx @@ -0,0 +1,54 @@ +--- +title: "Skipped Test Monitor" +description: "Detect tests that are consistently being skipped and apply labels to surface them for review." +--- + +The skipped test monitor tracks test cases that accumulate a configured number of skipped runs within a time window. It applies labels to those tests so your team can identify tests that are being silently ignored, rather than classifying them as flaky or broken. + +## When to Use This Monitor + +- **Surface suppressed tests:** Find tests that someone marked as skip (`.skip`, `xtest`, `xit`) and never re-enabled. +- **Track intentional skips:** Apply a `skipped` label so dashboards reflect tests that are excluded from runs, giving a more accurate picture of suite coverage. +- **Scope to specific branches:** Detect skips on main or release branches where a skipped test represents a gap in coverage rather than a development convenience. + +## How It Works + +The monitor counts the number of skipped runs for each test case within a configurable time window (in minutes). When a test accumulates at least `minSkippedCount` skipped runs in that window, the monitor activates and applies the configured labels. + +Resolution occurs after `resolutionDays` days pass with no new skipped runs recorded for that test on any monitored branch. + +## Configuration + +| Setting | Description | Default | +|---|---|---| +| Window | Time window (minutes) over which skipped runs are counted | Required | +| Min skipped count | Number of skipped runs in the window required to activate | Required | +| Resolution days | Days without a new skipped run before the monitor resolves | Required | +| Branch scope | Branch names or glob patterns to monitor | All branches | +| Action | Apply labels (the only available action — this monitor does not classify) | Apply labels | + +### Window + +The time window controls how far back the monitor looks when counting skipped runs. A shorter window (e.g., 60 minutes) catches tests skipped in a burst around a specific CI run. A longer window (e.g., 2 days, 2880 minutes) catches tests that are habitually skipped across many runs. + +### Min Skipped Count + +Set this to 1 to flag any test the moment it skips a single run in the window. Set it higher to require repeated skips, filtering out tests that are skipped once for a legitimate reason (such as a flaky environment that resolves itself). + +### Resolution Days + +After a test stops being skipped, the monitor waits `resolutionDays` before resolving. This prevents the label from flickering on and off for tests that skip intermittently. + +### Branch Scope + +Use branch patterns to limit detection to branches where a skipped test is significant. For example, monitoring only `main` means tests skipped on feature branches do not trigger the monitor. + +## Choosing Between Monitors + +| Goal | Recommended monitor | +|---|---| +| Detect tests consistently being skipped | Skipped test monitor | +| Track recently added tests | New test monitor | +| Flag tests whose runtime exceeds a threshold | Slow test monitor | +| Detect tests that fail then pass on retry | Pass-on-retry monitor | +| Alert on tests failing at a sustained rate | Failure rate monitor | diff --git a/flaky-tests/detection/slow-test-monitor.mdx b/flaky-tests/detection/slow-test-monitor.mdx new file mode 100644 index 0000000..be9f418 --- /dev/null +++ b/flaky-tests/detection/slow-test-monitor.mdx @@ -0,0 +1,55 @@ +--- +title: "Slow Test Monitor" +description: "Flag tests whose average runtime exceeds a configured duration threshold." +--- + +The slow test monitor detects test cases whose measured duration exceeds a threshold you set, evaluated over a configurable time window and sample size. It applies labels to slow tests so your team can identify and prioritize performance improvements without classifying tests as flaky or broken. + +## When to Use This Monitor + +- **Identify tests slowing down CI:** Surface the specific tests adding the most wall time to your pipeline. +- **Enforce duration budgets:** Label any test that exceeds an acceptable runtime so it gets reviewed before merging. +- **Track regressions:** Catch tests that were fast but became slow after a code change. + +## How It Works + +The monitor evaluates average test duration across runs in a rolling time window. When a test's average duration exceeds the configured threshold and enough sample runs have been collected, the monitor activates and applies the configured labels. + +Resolution happens when the test's measured duration drops back below the threshold over subsequent runs. If `staleAfterMinutes` is set, the monitor also resolves any active test that has had no recent runs on monitored branches — this prevents labels from persisting on tests that have been removed from the suite. + +## Configuration + +| Setting | Description | Default | +|---|---|---| +| Duration threshold | Minimum average test duration (milliseconds) to trigger detection | Required | +| Window | Time window (minutes) over which duration is measured | Required | +| Sample size | Minimum number of runs required before the monitor can activate | Required | +| Stale after | Minutes without any run on monitored branches before an active test resolves (optional) | Disabled | +| Branch scope | Branch names or glob patterns to monitor | All branches | +| Action | Apply labels (the only available action — this monitor does not classify) | Apply labels | + +### Duration Threshold + +Set the threshold in milliseconds. A value of 5000 flags any test averaging more than 5 seconds. Tune this based on your acceptable CI budget — tighter thresholds surface more tests but may require more review bandwidth. + +### Window and Sample Size + +The window controls how far back duration samples are collected. Sample size sets the minimum number of runs needed before the monitor will activate. This prevents a single slow run from triggering the monitor on a test with no history. For example, a window of 1440 minutes (one day) and a sample size of 5 means the monitor averages the last day's runs and requires at least five before drawing a conclusion. + +### Stale After + +When set, any test that has been active (labeled slow) but stops running on monitored branches for `staleAfterMinutes` minutes will be automatically resolved. Use this to clean up labels after a slow test is removed from the suite or renamed. + +### Branch Scope + +Scope the monitor to branches where test duration matters most, such as `main` or merge queue branches. Tests running on feature branches may have intentionally limited execution or variable infrastructure and may not represent a genuine slowness concern. + +## Choosing Between Monitors + +| Goal | Recommended monitor | +|---|---| +| Flag tests that are taking too long | Slow test monitor | +| Track recently added tests | New test monitor | +| Detect tests consistently being skipped | Skipped test monitor | +| Detect tests that fail then pass on retry | Pass-on-retry monitor | +| Alert on tests failing at a sustained rate | Failure rate monitor |