Commit 55cd1cd
committed
PERF: Speed up permutation cluster tests ~15x via Numba JIT kernels
The permutation loop in spatio_temporal_cluster_1samp_test is the main
computational bottleneck for source-space cluster analyses. On fsaverage
ico-5 (~20K vertices, 15 timepoints, 307K tests), 2048 permutations
previously took ~14 seconds; this patch brings it to under 1 second on
a 16-core EPYC.
The key changes, roughly in order of impact:
- Fused Numba union-find (_st_fused_ccl) replaces the Python BFS in
_get_clusters_st. Handles both spatial neighbors (CSR adjacency) and
temporal self-connections in a single compiled pass, avoiding the
overhead of the old _get_clusters_spatial + _reassign loop.
- Parallel permutation processing (_perm_batch_fast) fuses threshold
scan + CCL + weighted bincount + argmax into one @jit(parallel=True)
function with prange across permutations. Each perm gets its own
pre-allocated work buffers to avoid data races.
- Batched fused t-test (_batched_fused_ttest) reads X_T once from DRAM
and computes t-statistics for 32 permutations, amortizing memory
traffic ~32x. Uses prange across variables.
- Single-perm fused t-test (_fused_ttest) replaces the numpy sequence
(dot product + 8 elementwise ops) with one prange loop.
- Compact-graph CCL in _get_components: builds a subgraph of only the
supra-threshold vertices before calling connected_components, so CCL
operates on ~1K vertices instead of ~20K.
- Vectorized _pval_from_histogram with np.searchsorted (O(n log n)
instead of O(n * n_perms)).
- Vectorized _get_1samp_orders via bit-shifting instead of per-element
np.fromiter(np.binary_repr(...)).
- Pre-computed CSR arrays from _setup_adjacency, threaded through to
_find_clusters and the permutation loop to avoid redundant
neighbor-list-to-CSR conversion.
All fast paths are gated behind has_numba checks and identity checks on
the stat function (only activates for the built-in ttest_1samp_no_p and
f_oneway). Non-Numba fallback paths are unchanged.
Tested on EPYC 9R14 (16 vCPU), fsaverage ico-5, 15 subjects, 2048
permutations: 0.37 ms/perm wall time vs ~5.7 ms/perm baseline.1 parent 14d0916 commit 55cd1cd
3 files changed
Lines changed: 827 additions & 106 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
307 | 307 | | |
308 | 308 | | |
309 | 309 | | |
| 310 | + | |
310 | 311 | | |
311 | 312 | | |
312 | 313 | | |
| |||
0 commit comments