Skip to content

Commit 55cd1cd

Browse files
committed
PERF: Speed up permutation cluster tests ~15x via Numba JIT kernels
The permutation loop in spatio_temporal_cluster_1samp_test is the main computational bottleneck for source-space cluster analyses. On fsaverage ico-5 (~20K vertices, 15 timepoints, 307K tests), 2048 permutations previously took ~14 seconds; this patch brings it to under 1 second on a 16-core EPYC. The key changes, roughly in order of impact: - Fused Numba union-find (_st_fused_ccl) replaces the Python BFS in _get_clusters_st. Handles both spatial neighbors (CSR adjacency) and temporal self-connections in a single compiled pass, avoiding the overhead of the old _get_clusters_spatial + _reassign loop. - Parallel permutation processing (_perm_batch_fast) fuses threshold scan + CCL + weighted bincount + argmax into one @jit(parallel=True) function with prange across permutations. Each perm gets its own pre-allocated work buffers to avoid data races. - Batched fused t-test (_batched_fused_ttest) reads X_T once from DRAM and computes t-statistics for 32 permutations, amortizing memory traffic ~32x. Uses prange across variables. - Single-perm fused t-test (_fused_ttest) replaces the numpy sequence (dot product + 8 elementwise ops) with one prange loop. - Compact-graph CCL in _get_components: builds a subgraph of only the supra-threshold vertices before calling connected_components, so CCL operates on ~1K vertices instead of ~20K. - Vectorized _pval_from_histogram with np.searchsorted (O(n log n) instead of O(n * n_perms)). - Vectorized _get_1samp_orders via bit-shifting instead of per-element np.fromiter(np.binary_repr(...)). - Pre-computed CSR arrays from _setup_adjacency, threaded through to _find_clusters and the permutation loop to avoid redundant neighbor-list-to-CSR conversion. All fast paths are gated behind has_numba checks and identity checks on the stat function (only activates for the built-in ttest_1samp_no_p and f_oneway). Non-Numba fallback paths are unchanged. Tested on EPYC 9R14 (16 vCPU), fsaverage ico-5, 15 subjects, 2048 permutations: 0.37 ms/perm wall time vs ~5.7 ms/perm baseline.
1 parent 14d0916 commit 55cd1cd

3 files changed

Lines changed: 827 additions & 106 deletions

File tree

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Speed up :func:`mne.stats.spatio_temporal_cluster_1samp_test` permutation loop by ~15x via Numba JIT kernels for the t-test, threshold extraction, and connected-component labeling steps, by :newcontrib:`Sharif Haason`.

doc/changes/names.inc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,7 @@
307307
.. _Sena Er: https://github.com/sena-neuro
308308
.. _Senwen Deng: https://snwn.de
309309
.. _Seyed Yahya Shirazi: https://neuromechanist.github.io
310+
.. _Sharif Haason: https://github.com/sharifhsn
310311
.. _Sheraz Khan: https://github.com/SherazKhan
311312
.. _Shresth Keshari: https://github.com/shresth-keshari
312313
.. _Shristi Baral: https://github.com/shristibaral

0 commit comments

Comments
 (0)