Separate benchmark into different files by blegat · Pull Request #56 · blegat/ArrayDiff.jl

blegat · 2026-05-06T17:53:22Z

CPU

Lux

BenchmarkTools.Trial: 59 samples with 1 evaluation per sample.
 Range (min … max):  49.335 ms … 816.945 ms  ┊ GC (min … max):  0.00% … 93.70%
 Time  (median):     56.704 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   85.364 ms ± 138.152 ms  ┊ GC (mean ± σ):  35.31% ± 20.66%

  █▆                                                            
  ██▁▁▁▁▁▅▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  49.3 ms       Histogram: log(frequency) by time       803 ms <

 Memory estimate: 16.97 MiB, allocs estimate: 61.

Hand-CUDA without prealloc

BenchmarkTools.Trial: 295 samples with 1 evaluation per sample.
 Range (min … max):  10.423 ms … 804.815 ms  ┊ GC (min … max):  0.00% … 98.62%
 Time  (median):     10.766 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   17.858 ms ±  64.978 ms  ┊ GC (mean ± σ):  36.75% ± 13.70%

  █ ▄▁                                                          
  █▅███▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▅
  10.4 ms       Histogram: log(frequency) by time       111 ms <

 Memory estimate: 14.11 MiB, allocs estimate: 23.

Hand-CUDA with prealloc

BenchmarkTools.Trial: 350 samples with 1 evaluation per sample.
 Range (min … max):  10.260 ms … 812.618 ms  ┊ GC (min … max):  0.00% … 98.62%
 Time  (median):     11.012 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   14.295 ms ±  43.182 ms  ┊ GC (mean ± σ):  20.32% ±  9.80%

     ▂██▃ ▁                       ▁                             
  ▇▅▄████▇██▅▅▄▁▁▁▁▁▁▁▁▄▁▆▇▇▁▆▅▄▅▇█▇▇▇▄▄▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▄▄▄▁▁▁▅ ▆
  10.3 ms       Histogram: log(frequency) by time      18.7 ms <

 Memory estimate: 8.55 MiB, allocs estimate: 15.

PyTorch eager

BenchmarkTools.Trial: 2089 samples with 1 evaluation per sample.
 Range (min … max):  1.667 ms …   6.379 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.291 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.392 ms ± 447.479 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▁▁▁▂▅▄█▃▃▃▄▁▁▃ ▁                                      
  ▃▃▄▄▅▆▇█████████████████▇█▅▅▅▄▄▄▃▃▃▄▃▃▄▃▃▃▃▃▃▃▄▃▃▃▃▃▂▂▃▂▁▂▂ ▄
  1.67 ms         Histogram: frequency by time        3.87 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

PyTorch compiled

BenchmarkTools.Trial: 1694 samples with 1 evaluation per sample.
 Range (min … max):  2.041 ms …   8.446 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.845 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.951 ms ± 542.243 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▂▆▄▄█▇▇█▆▇▇▇▇▆▇▃▁▂                                    
  ▂▃▄▄▄▅▆███████████████████▇▆▅▄▅▅▄▄▄▂▅▃▃▃▃▃▃▄▂▃▃▃▃▂▃▂▃▄▃▁▂▃▂ ▅
  2.04 ms         Histogram: frequency by time        4.76 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

ArrayDiff

BenchmarkTools.Trial: 345 samples with 1 evaluation per sample.
 Range (min … max):  14.193 ms …  15.950 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     14.473 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.482 ms ± 138.881 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                  ▁ ▄▆▃▁█▄▇▃▅▃▇▂ ▁                              
  ▃▁▁▁▃▃▃▄▅▃▅▄▆▄▇▇█▇████████████▆█▆▆▃▆▄▄▃▅▃▄▁▁▁▁▃▃▃▁▃▁▁▁▁▁▁▁▁▃ ▄
  14.2 ms         Histogram: frequency by time         14.9 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

GPU

Lux

BenchmarkTools.Trial: 6032 samples with 1 evaluation per sample.
 Range (min … max):  656.703 μs … 32.579 ms  ┊ GC (min … max): 0.00% … 42.86%
 Time  (median):     697.679 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   827.964 μs ±  1.821 ms  ┊ GC (mean ± σ):  6.58% ±  2.92%

     ▂█                                                         
  ▂▂▃██▇▄▄▄▆█▆▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▂▁▂▁▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▂▂ ▃
  657 μs          Histogram: frequency by time         1.12 ms <

 Memory estimate: 43.38 KiB, allocs estimate: 1197.

Hand-CUDA without prealloc

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  260.088 μs … 25.740 ms  ┊ GC (min … max): 0.00% … 46.47%
 Time  (median):     288.760 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   337.227 μs ±  1.073 ms  ┊ GC (mean ± σ):  7.32% ±  2.27%

           ▅        ▃█▃▄                                        
  ▁▁▁▁▂▃▆███▇▇▄▃▃▃▆██████▄▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  260 μs          Histogram: frequency by time          357 μs <

 Memory estimate: 13.48 KiB, allocs estimate: 421.

Hand-CUDA with prealloc

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  248.923 μs … 31.751 ms  ┊ GC (min … max): 0.00% … 45.02%
 Time  (median):     272.885 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   309.503 μs ±  1.009 ms  ┊ GC (mean ± σ):  5.32% ±  1.62%

           ▁▂▇█▄▄▂        ▅▅▃▅▅                                 
  ▁▁▁▁▂▃▃▄████████▆▅▆▄▄▇▆▇██████▆▆▃▂▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  249 μs          Histogram: frequency by time          322 μs <

 Memory estimate: 12.48 KiB, allocs estimate: 384.

PyTorch eager

BenchmarkTools.Trial: 5651 samples with 1 evaluation per sample.
 Range (min … max):  293.197 μs …   2.407 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     936.678 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   882.135 μs ± 162.453 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                              ▇█▄                
  ▁▁▁▁▂▂▁▁▁▁▁▁▁▁▁▂▂▂▁▁▁▂▂▂▃▄▃▂▂▁▂▁▂▂▄▄▄▄▂▃▃▃▄█████▅▃▂▁▁▁▁▂▂▂▃▂▂ ▂
  293 μs           Histogram: frequency by time         1.18 ms <

 Memory estimate: 160 bytes, allocs estimate: 8.

/home/blegat/.julia/dev/ArrayDiff/perf/.CondaPkg/.pixi/envs/default/lib/python3.12/site-packages/torch/_inductor/compile_fx.py:322: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting torch.set_float32_matmul_precision('high') for better performance.
warnings.warn(
W0508 08:52:23.348000 48301 site-packages/torch/_inductor/utils.py:1731] [1/1_1] Not enough SMs to use max_autotune_gemm mode

PyTorch compiled

BenchmarkTools.Trial: 3861 samples with 1 evaluation per sample.
 Range (min … max):  695.198 μs …   2.572 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       1.329 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.292 ms ± 116.801 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                   ▄▅▅█▅▂▁       
  ▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▂▂▂▂▃▄▄▄▃▃▃▃▂▂▃▃▃▃▄▄▅▆▆▅▅▄▄▄▄▅████████▇▆▅▄▃ ▃
  695 μs           Histogram: frequency by time         1.46 ms <

 Memory estimate: 160 bytes, allocs estimate: 8.

ArrayDiff

BenchmarkTools.Trial: 401 samples with 1 evaluation per sample.
 Range (min … max):  11.651 ms …  15.955 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     12.145 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.486 ms ± 934.948 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    █▇▃▄▅█▁▂ ▁▁                                                 
  ▃▇████████▆██▆▆▅▄▄▃▂▃▂▂▂▁▁▂▁▂▁▁▂▂▂▂▂▁▁▁▁▁▂▃▂▃▃▁▃▂▃▃▃▄▃▃▁▂▃▁▃ ▃
  11.7 ms         Histogram: frequency by time         15.5 ms <

 Memory estimate: 210.83 KiB, allocs estimate: 9918.

codecov · 2026-05-06T17:58:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.20%. Comparing base (5b4d9ab) to head (263c7a6).

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #56   +/-   ##
=======================================
  Coverage   90.20%   90.20%           
=======================================
  Files          23       23           
  Lines        2848     2848           
=======================================
  Hits         2569     2569           
  Misses        279      279

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Separate benchmark into different files

556487c

blegat force-pushed the bl/sep_bench branch from c4727a6 to 556487c Compare May 6, 2026 17:55

blegat added 4 commits May 6, 2026 20:01

Fix format

d3bed80

Add setup for arraydiff

9f28d88

Improve bench script

ae20ce9

Fix format

263c7a6

blegat merged commit 4b2585e into main May 8, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate benchmark into different files#56

Separate benchmark into different files#56
blegat merged 5 commits intomainfrom
bl/sep_bench

blegat commented May 6, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blegat commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CPU

Lux

Hand-CUDA without prealloc

Hand-CUDA with prealloc

PyTorch eager

PyTorch compiled

ArrayDiff

GPU

Lux

Hand-CUDA without prealloc

Hand-CUDA with prealloc

PyTorch eager

PyTorch compiled

ArrayDiff

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

blegat commented May 6, 2026 •

edited

Loading

codecov Bot commented May 6, 2026 •

edited

Loading