Skip to content

Separate benchmark into different files#56

Merged
blegat merged 5 commits intomainfrom
bl/sep_bench
May 8, 2026
Merged

Separate benchmark into different files#56
blegat merged 5 commits intomainfrom
bl/sep_bench

Conversation

@blegat
Copy link
Copy Markdown
Owner

@blegat blegat commented May 6, 2026

CPU

Lux

BenchmarkTools.Trial: 59 samples with 1 evaluation per sample.
 Range (min … max):  49.335 ms … 816.945 ms  ┊ GC (min … max):  0.00% … 93.70%
 Time  (median):     56.704 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   85.364 ms ± 138.152 ms  ┊ GC (mean ± σ):  35.31% ± 20.66%

  █▆                                                            
  ██▁▁▁▁▁▅▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  49.3 ms       Histogram: log(frequency) by time       803 ms <

 Memory estimate: 16.97 MiB, allocs estimate: 61.

Hand-CUDA without prealloc

BenchmarkTools.Trial: 295 samples with 1 evaluation per sample.
 Range (min … max):  10.423 ms … 804.815 ms  ┊ GC (min … max):  0.00% … 98.62%
 Time  (median):     10.766 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   17.858 ms ±  64.978 ms  ┊ GC (mean ± σ):  36.75% ± 13.70%

  █ ▄▁                                                          
  █▅███▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▅
  10.4 ms       Histogram: log(frequency) by time       111 ms <

 Memory estimate: 14.11 MiB, allocs estimate: 23.

Hand-CUDA with prealloc

BenchmarkTools.Trial: 350 samples with 1 evaluation per sample.
 Range (min … max):  10.260 ms … 812.618 ms  ┊ GC (min … max):  0.00% … 98.62%
 Time  (median):     11.012 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   14.295 ms ±  43.182 ms  ┊ GC (mean ± σ):  20.32% ±  9.80%

     ▂██▃ ▁                       ▁                             
  ▇▅▄████▇██▅▅▄▁▁▁▁▁▁▁▁▄▁▆▇▇▁▆▅▄▅▇█▇▇▇▄▄▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▄▄▄▁▁▁▅ ▆
  10.3 ms       Histogram: log(frequency) by time      18.7 ms <

 Memory estimate: 8.55 MiB, allocs estimate: 15.

PyTorch eager

BenchmarkTools.Trial: 2089 samples with 1 evaluation per sample.
 Range (min … max):  1.667 ms …   6.379 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.291 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.392 ms ± 447.479 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▁▁▁▂▅▄█▃▃▃▄▁▁▃ ▁                                      
  ▃▃▄▄▅▆▇█████████████████▇█▅▅▅▄▄▄▃▃▃▄▃▃▄▃▃▃▃▃▃▃▄▃▃▃▃▃▂▂▃▂▁▂▂ ▄
  1.67 ms         Histogram: frequency by time        3.87 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

PyTorch compiled

BenchmarkTools.Trial: 1694 samples with 1 evaluation per sample.
 Range (min … max):  2.041 ms …   8.446 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.845 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.951 ms ± 542.243 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▂▆▄▄█▇▇█▆▇▇▇▇▆▇▃▁▂                                    
  ▂▃▄▄▄▅▆███████████████████▇▆▅▄▅▅▄▄▄▂▅▃▃▃▃▃▃▄▂▃▃▃▃▂▃▂▃▄▃▁▂▃▂ ▅
  2.04 ms         Histogram: frequency by time        4.76 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

ArrayDiff

BenchmarkTools.Trial: 345 samples with 1 evaluation per sample.
 Range (min … max):  14.193 ms …  15.950 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     14.473 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.482 ms ± 138.881 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                  ▁ ▄▆▃▁█▄▇▃▅▃▇▂ ▁                              
  ▃▁▁▁▃▃▃▄▅▃▅▄▆▄▇▇█▇████████████▆█▆▆▃▆▄▄▃▅▃▄▁▁▁▁▃▃▃▁▃▁▁▁▁▁▁▁▁▃ ▄
  14.2 ms         Histogram: frequency by time         14.9 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

GPU

Lux

BenchmarkTools.Trial: 6032 samples with 1 evaluation per sample.
 Range (min … max):  656.703 μs … 32.579 ms  ┊ GC (min … max): 0.00% … 42.86%
 Time  (median):     697.679 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   827.964 μs ±  1.821 ms  ┊ GC (mean ± σ):  6.58% ±  2.92%

     ▂█                                                         
  ▂▂▃██▇▄▄▄▆█▆▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▂▁▂▁▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▂▂ ▃
  657 μs          Histogram: frequency by time         1.12 ms <

 Memory estimate: 43.38 KiB, allocs estimate: 1197.

Hand-CUDA without prealloc

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  260.088 μs … 25.740 ms  ┊ GC (min … max): 0.00% … 46.47%
 Time  (median):     288.760 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   337.227 μs ±  1.073 ms  ┊ GC (mean ± σ):  7.32% ±  2.27%

           ▅        ▃█▃▄                                        
  ▁▁▁▁▂▃▆███▇▇▄▃▃▃▆██████▄▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  260 μs          Histogram: frequency by time          357 μs <

 Memory estimate: 13.48 KiB, allocs estimate: 421.

Hand-CUDA with prealloc

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  248.923 μs … 31.751 ms  ┊ GC (min … max): 0.00% … 45.02%
 Time  (median):     272.885 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   309.503 μs ±  1.009 ms  ┊ GC (mean ± σ):  5.32% ±  1.62%

           ▁▂▇█▄▄▂        ▅▅▃▅▅                                 
  ▁▁▁▁▂▃▃▄████████▆▅▆▄▄▇▆▇██████▆▆▃▂▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  249 μs          Histogram: frequency by time          322 μs <

 Memory estimate: 12.48 KiB, allocs estimate: 384.

PyTorch eager

BenchmarkTools.Trial: 5651 samples with 1 evaluation per sample.
 Range (min … max):  293.197 μs …   2.407 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     936.678 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   882.135 μs ± 162.453 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                              ▇█▄                
  ▁▁▁▁▂▂▁▁▁▁▁▁▁▁▁▂▂▂▁▁▁▂▂▂▃▄▃▂▂▁▂▁▂▂▄▄▄▄▂▃▃▃▄█████▅▃▂▁▁▁▁▂▂▂▃▂▂ ▂
  293 μs           Histogram: frequency by time         1.18 ms <

 Memory estimate: 160 bytes, allocs estimate: 8.

/home/blegat/.julia/dev/ArrayDiff/perf/.CondaPkg/.pixi/envs/default/lib/python3.12/site-packages/torch/_inductor/compile_fx.py:322: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting torch.set_float32_matmul_precision('high') for better performance.
warnings.warn(
W0508 08:52:23.348000 48301 site-packages/torch/_inductor/utils.py:1731] [1/1_1] Not enough SMs to use max_autotune_gemm mode

PyTorch compiled

BenchmarkTools.Trial: 3861 samples with 1 evaluation per sample.
 Range (min … max):  695.198 μs …   2.572 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       1.329 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.292 ms ± 116.801 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                   ▄▅▅█▅▂▁       
  ▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▂▂▂▂▃▄▄▄▃▃▃▃▂▂▃▃▃▃▄▄▅▆▆▅▅▄▄▄▄▅████████▇▆▅▄▃ ▃
  695 μs           Histogram: frequency by time         1.46 ms <

 Memory estimate: 160 bytes, allocs estimate: 8.

ArrayDiff

BenchmarkTools.Trial: 401 samples with 1 evaluation per sample.
 Range (min … max):  11.651 ms …  15.955 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     12.145 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.486 ms ± 934.948 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    █▇▃▄▅█▁▂ ▁▁                                                 
  ▃▇████████▆██▆▆▅▄▄▃▂▃▂▂▂▁▁▂▁▂▁▁▂▂▂▂▂▁▁▁▁▁▂▃▂▃▃▁▃▂▃▃▃▄▃▃▁▂▃▁▃ ▃
  11.7 ms         Histogram: frequency by time         15.5 ms <

 Memory estimate: 210.83 KiB, allocs estimate: 9918.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.20%. Comparing base (5b4d9ab) to head (263c7a6).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #56   +/-   ##
=======================================
  Coverage   90.20%   90.20%           
=======================================
  Files          23       23           
  Lines        2848     2848           
=======================================
  Hits         2569     2569           
  Misses        279      279           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@blegat blegat merged commit 4b2585e into main May 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant