Skip to content

Introduce robust metrics#379

Open
oleksandr-pavlyk wants to merge 5 commits into
NVIDIA:mainfrom
oleksandr-pavlyk:introduce-robust-metrics
Open

Introduce robust metrics#379
oleksandr-pavlyk wants to merge 5 commits into
NVIDIA:mainfrom
oleksandr-pavlyk:introduce-robust-metrics

Conversation

@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator

@oleksandr-pavlyk oleksandr-pavlyk commented May 29, 2026

  1. Add statistics utilities to compute quartiles using nearest rank method and tests.

    • sort-based computation for shorter datasets ( < 4096 float64_t elements)
    • selection-based computation for larger arrays (to keep overall complexity O(n))
  2. Add quartile information for "nv/cold/cpu/time", "nv/cold/gpu/time", and "nv/cpy_only/time" summaries.
    Tags added are:

    • "*/median" : median
    • "*/q1": first quartile
    • "*/q3": third quartile
    • "*/ir/absolute": absolute interquartile range = $Q_3 - Q_1$
    • "*/ir/relative": relative interquartile range, $(Q_3 - Q_1) / Q_2$.
  3. Make "*/mean" and "*/stdev/relative" hidden, replaced by "*/median" and "*/ir/relative".

Closes #342 .

Technically, due to change described in item 3, `"CPU Time"`/`"Noise"` as well as `"GPU Time"`/`"Noise"` entries in the summary tables output by NVBench instrumented benchmarks change from being based on (`mean`, `standard_dev`) to being based on (`median`, `interquartile_range`).

This change only affects printed summaries, i.e., --markdown and --csv outputs. Behavior of nvbench_compare won't change as JSON data still contains mean and standard deviation entries, albeit hidden by default.


Update

The change implementing item 3 has been reverted. See comments below.

Quartiles are computed using nearest rank method.

Two implementations are provided:
  1. Sort-based:
     a. sort array
     b. extract values at ranks of interest
  2. Selection based:
     a. Run nth_element to find median on whole range
     b. Run nth_element on left side to find first quartile
     c. Run nth_element on right side to find thirst quartile

Public API copies input into temporary vector which is mutated as needed.

Public API uses sort-based implementation for small arrays ( <= 4096 elements),
and selection-based implementation for larger arrays.

Sort-based implementation can support computation of arbitrary percentiles,
which could be useful later if more extreme statistics is needed.

Add tests covering percentile and quartile edge cases, input iterators,
selection-vs-sorting agreement, empty and singleton inputs, and relative
dispersion validation.
Use the quartile helpers to report robust cold and CPU-only timing summaries:
Q1, median, Q3, interquartile range, and relative interquartile range.
These values stay hidden.

Summary tags are nv/cold/time/gpu/q1, nv/cold/time/gpu/median,
nv/cold/time/gpu/q3, nv/cold/time/gpu/ir/absolute, nv/cold/time/gpu/ir/relative

ir/absolute = q3 - q1, ir/relative = (q3 - q1)/median

Similar tags added for nv/cold/time/cpu and for CPU-only measures.

Validate relative-dispersion calculations before publishing relative noise
summaries so invalid centers or dispersion values do not produce misleading
summary entries.
Only flip visibility for nv/cold/cpu/time, nv/cold/gpu/time,
and nv/cpu_only/only:
  - hide mean
  - hide stdev/relative
  - show median
  - show ir/relative
@oleksandr-pavlyk oleksandr-pavlyk requested a review from fbusato May 29, 2026 20:23
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

✅ Actions performed

Full review triggered.

@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

This PR supersedes #348 .

@oleksandr-pavlyk oleksandr-pavlyk added the type: enhancement New feature or request. label May 29, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Enhanced benchmark timing summaries now include quartile metrics (Q1, median, Q3) and interquartile range measurements (absolute and relative variants) for better distribution visibility.
    • Improved noise analysis with conditional relative dispersion metrics displayed alongside absolute standard deviation.
  • Tests

    • Added comprehensive test coverage for percentile, quartile, and robust noise calculations.

Walkthrough

This PR replaces mean/standard-deviation statistics with robust quartile-based measures across NVBench's CPU and GPU timing reports. New percentile/quartile utilities compute first quartile, median, third quartile, absolute interquartile range, and relative interquartile range (gated on sample count). These replace direct stdev/mean ratios in measurement summaries and timeout-warning thresholds.

Changes

Robust Statistics Implementation and Integration

Layer / File(s) Summary
Percentile and quartile computation utilities
nvbench/detail/statistics.cuh
Introduces percentile_rank using nearest-rank method, compute_percentiles_by_sorting and iterator-based variants, quartiles_t struct, and compute_quartiles dispatcher that selects between sorting-based and std::nth_element selection strategies. Adds compute_relative_interquartile_range and compute_robust_noise (gated on min_samples_for_noise_estimate).
Tests for percentile, quartile, and robust noise
testing/statistics.cu
Adds test_percentiles (fixed/empty/out-of-range/iterator inputs), test_quartiles (API consistency, threshold-boundary loop, NaN behavior), and test_relative_interquartile_range (normal cases and edge cases: zero, negative, NaN, infinity, extremes). Helper functions assert_quartiles_equal and assert_quartiles_nan validate quartile outputs.
CPU-only measurement with robust statistics
nvbench/detail/measure_cpu_only.cxx
Computes and conditionally emits relative stdev, quartiles (q1/median/q3), absolute interquartile range, and optional relative interquartile range summaries. Updates timeout warning to check optional cpu_stdev_noise instead of unconditional cpu_noise ratio.
Cold GPU/CPU measurement with robust statistics
nvbench/detail/measure_cold.cu
Emits quartile and interquartile-range summaries for both CPU and GPU times, replacing direct stdev/mean ratios with optional relative-dispersion values. Updates GPU timeout warning to gate on optional cuda_stdev_noise and format percentage from that optional value.

Assessment against linked issues

Objective Addressed Explanation
Report Q1, median, Q3 with tags nv/cold/time/gpu/q1, nv/cold/time/gpu/median, nv/cold/time/gpu/q3 for cold GPU times [#342]
Report absolute and relative interquartile range with tags nv/cold/time/gpu/ir/absolute and nv/cold/time/gpu/ir/relative [#342]
Apply robust statistics uniformly across CPU-only and cold measurements [#342]
Hide mean and stdev/relative, replace with median and ir/relative in summaries [#342] Summary generation conditionally emits relative stdev and ir/relative only when sample count meets threshold; metadata indicates "Hidden by default" for relative summaries, but unclear if mean/median display behavior matches "replace" intent.

Possibly related PRs

  • NVIDIA/nvbench#374: Updates stdrel_criterion noise tracking and termination to use optional compute_relative_dispersion results instead of direct stdev/mean ratio, mirroring the optional-gating pattern applied here in measurement summaries and timeout checks.

Suggested labels

area: performance

Suggested reviewers

  • alliepiper
  • fbusato

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 56812ebf-b232-466b-aecc-80a18471a348

📥 Commits

Reviewing files that changed from the base of the PR and between 7ba2b79 and 9a0afc3.

📒 Files selected for processing (4)
  • nvbench/detail/measure_cold.cu
  • nvbench/detail/measure_cpu_only.cxx
  • nvbench/detail/statistics.cuh
  • testing/statistics.cu

Comment thread nvbench/detail/measure_cold.cu Outdated
Comment thread nvbench/detail/measure_cpu_only.cxx Outdated
Comment thread nvbench/detail/statistics.cuh
Comment thread testing/statistics.cu Outdated
Comment thread testing/statistics.cu
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

oleksandr-pavlyk commented May 31, 2026

I separated commits adding computation of quartiles and outputting them to summaries from the commit that makes 'mean'/'stdev/relative' hidden, and 'median'/ir/relative displayed.

Technically, this change is not necessary for #313, since nvbench_compare reads summaries irrespective of their "hide" attribute.

Pros of using robust metrics:

  • not sensitive to outliers (hence robust)

Cons:

  • Adding non-outlier data-point to the dataset may cause significant change in the value of any of the quartiles.
  • Variance of median estimator is higher than variance of mean estimator
  • interquartile range, as a measure of dispersion and for common parametric distributions, is related to standard deviation via some distribution dependent scaling factor.
Distribution IR/stdev
Normal(0, 1) $\approx 1.35$
Exp(1) $\approx 1.10$
Gamma(2) $\approx 1.22$
U(min, max) $\approx 1.73$

Perhaps, Winsorized mean and Winsorized standard deviation should be added to the summaries and displayed instead. These would be regular mean and standard deviation computed on the sample dataset where top $1-p$ percents of the samples are replaced with $p$-th percentile of the original dataset.

Additionally, summaries may contain Winsorized values for different values of $p$ (such as 80, 85, 90, 93, 95, 97), with $p=95$ being the displayed value.

The choice of what to replace displayed values with is to be deferred to a different PR. For this PR, we need to decide whether to keep displaying mean/standard-deviation or replace them with median/interquartile-range.

@oleksandr-pavlyk oleksandr-pavlyk self-assigned this May 31, 2026
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 31, 2026
@oleksandr-pavlyk oleksandr-pavlyk moved this from Todo to In Review in CCCL May 31, 2026
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

Ok, I think the right thing to do is to revert change implementing item 3, and open it up as a separate PR.

This reverts commit 9a0afc3.

Basically, all robust statistics summaries entries are hidden,
and mean + stdev/relative are back to be default displayed items
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request.

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Use robust statistic in NVBench summary

1 participant