Skip to content

Add tuning search based on CompileIQ#9190

Open
bernhardmgruber wants to merge 1 commit into
NVIDIA:mainfrom
bernhardmgruber:compile_iq
Open

Add tuning search based on CompileIQ#9190
bernhardmgruber wants to merge 1 commit into
NVIDIA:mainfrom
bernhardmgruber:compile_iq

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber commented May 29, 2026

This is mostly done by claude, trying to migrate the internal cub_tuning_evo scripts. This PR adds a simplified version using a single worker, running benchmarks on a single GPU.

Running:

mkdir build_tune & cd build_tune
cmake .. --preset cub-tuning
CUDA_VISIBLE_DEVICES=0 ../benchmarks/scripts/search_iq.py -R 'cub.bench.transform.babelstream.*' -a 'T{ct}=F32'
 ctk:  13.3.33
cccl:  v3.5.0.dev-121-g575176ff50
🧬 Generation:  0/50|░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| [elapsed: 00:00 · eta: ?] Evaluating variant {'alg': 3, 'bif': 4, 'pref': 2, 'tpb': 768, 'unrl': 1, 'vsp2': 6}: 0.7940340093966018
Evaluating variant {'alg': 2, 'bif': 0, 'pref': 3, 'tpb': 128, 'unrl': 2, 'vsp2': 1}: 0.7870774540476999
Evaluating variant {'alg': 0, 'bif': -8, 'pref': 1, 'tpb': 640, 'unrl': 3, 'vsp2': 5}: 0.7340502424429781
Evaluating variant {'alg': 1, 'bif': -16, 'pref': 1, 'tpb': 384, 'unrl': 4, 'vsp2': 3}: 0.6930208162203446
Evaluating variant {'alg': 4, 'bif': 12, 'pref': 3, 'tpb': 1024, 'unrl': 2, 'vsp2': 2}: Build failed
Evaluating variant {'alg': 1, 'bif': -12, 'pref': 2, 'tpb': 896, 'unrl': 4, 'vsp2': 5}: 0.19958390512595525
Evaluating variant {'alg': 3, 'bif': 0, 'pref': 2, 'tpb': 128, 'unrl': 1, 'vsp2': 4}: 0.7767127690888875
Evaluating variant {'alg': 2, 'bif': 8, 'pref': 3, 'tpb': 640, 'unrl': 3, 'vsp2': 6}: 0.7872272803730863
Evaluating variant {'alg': 4, 'bif': -4, 'pref': 1, 'tpb': 768, 'unrl': 3, 'vsp2': 2}: Build failed
Evaluating variant {'alg': 0, 'bif': 16, 'pref': 1, 'tpb': 384, 'unrl': 4, 'vsp2': 1}: 0.7388724658257656
...

It's still a bit confusing, because after running, the database shows different results, but it looks like the score reported by analyze.py is just computed differently than the score passed to compile-iq.

$ ../benchmarks/scripts/analyze.py --top=100 cccl_meta_bench.db
cub.bench.transform.babelstream[T{ct}=F32]:
                                          variant     score      mins     means      maxs
9    bif_16.alg_2.tpb_768.unrl_3.pref_1.vsp2_6 ()  1.026887  1.000000  1.025528  1.200000
11    bif_8.alg_2.tpb_640.unrl_3.pref_3.vsp2_6 ()  1.026887  1.000000  1.025528  1.200000
6     bif_0.alg_2.tpb_128.unrl_2.pref_3.vsp2_1 ()  1.026641  1.000000  1.025313  1.200000
10    bif_4.alg_3.tpb_768.unrl_1.pref_2.vsp2_6 ()  1.025089  1.000000  1.023932  1.200000
7     bif_0.alg_3.tpb_128.unrl_1.pref_2.vsp2_4 ()  1.013059  0.999988  1.012499  1.200000
5     bif_0.alg_1.tpb_640.unrl_2.pref_2.vsp2_1 ()  1.012436  1.000000  1.011788  1.166667
0                                         base ()  1.000000  1.000000  1.000000  1.000000
1   bif_-12.alg_0.tpb_384.unrl_1.pref_3.vsp2_2 ()  0.962345  0.600000  0.963987  1.012048
8    bif_16.alg_0.tpb_384.unrl_4.pref_1.vsp2_1 ()  0.962344  0.600000  0.963986  1.012048
4    bif_-8.alg_0.tpb_640.unrl_3.pref_1.vsp2_5 ()  0.953269  0.600000  0.955280  1.012048
3   bif_-16.alg_1.tpb_384.unrl_4.pref_1.vsp2_3 ()  0.885473  0.500000  0.859326  1.001472
2   bif_-12.alg_1.tpb_896.unrl_4.pref_2.vsp2_5 ()  0.256298  0.048387  0.249873  0.409063

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 29, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 29, 2026
@bernhardmgruber bernhardmgruber marked this pull request as ready for review May 31, 2026 20:15
@bernhardmgruber bernhardmgruber requested a review from a team as a code owner May 31, 2026 20:15
@cccl-authenticator-app cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL May 31, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Added CompileIQ-based evolutionary search capability for algorithm optimization alongside brute-force search options
  • Bug Fixes

    • Improved thread-safety handling in benchmark execution for multithreaded contexts
    • Enhanced SQLite storage validation and cross-thread database access support

Walkthrough

Three changes add multithreaded benchmark execution support: ProcessRunner now guards signal handler registration to the main thread, SQLiteStorage validates thread-safety and enables cross-thread connection use, and a new CompileIQSeeker orchestrator selects between brute-force and evolutionary search strategies based on problem size.

Changes

Multithreaded Benchmark Infrastructure and Search

Layer / File(s) Summary
ProcessRunner signal handler main-thread guard
benchmarks/scripts/cccl/bench/bench.py
ProcessRunner.init imports threading and wraps signal.signal() calls to execute only on the main thread, preventing invalid signal registration in worker threads.
SQLiteStorage thread-safety validation and cross-thread support
benchmarks/scripts/cccl/bench/storage.py
SQLiteStorage.init validates SQLite runtime threadsafety level requires serialized mode and sets check_same_thread=False on connection creation to enable safe cross-thread access.
CompileIQSeeker benchmark search orchestration
benchmarks/scripts/search_iq.py
New benchmark driver script defines search-space and pool sizing helpers, builds an objective function that evaluates bench.Bench variants and filters failed/infinite results, and introduces CompileIQSeeker class that selects brute-force or evolutionary search based on estimated expected-run count.

Suggested reviewers

  • NaderAlAwar
  • pauleonix
  • elstehle

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
benchmarks/scripts/search_iq.py (1)

120-131: ⚡ Quick win

suggestion: Rename parameter or variable to clarify intent.

Line 125 passes num_rt_workloads to a parameter named num_objectives in get_num_expected_runs(). The function signature and iq_search() (line 83) use num_objectives=1, but the calculation here uses num_rt_workloads. Either rename the parameter in get_num_expected_runs() to reflect its actual usage, or clarify the relationship between RT workloads and the expected-runs heuristic.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 75cb46dd-cbc5-44dd-9b6d-7dd9bc33fb9b

📥 Commits

Reviewing files that changed from the base of the PR and between ee20627 and 09ec91a.

📒 Files selected for processing (3)
  • benchmarks/scripts/cccl/bench/bench.py
  • benchmarks/scripts/cccl/bench/storage.py
  • benchmarks/scripts/search_iq.py

)

if score == float("inf") or score == float("-inf"):
print("Infinite store")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

important: Fix typo "Infinite store" → "Infinite score".

Line 60 prints "Infinite store" but should print "Infinite score" to match the condition being checked.

@bernhardmgruber
Copy link
Copy Markdown
Contributor Author

ok, here is a confusing bit. analyze.py shows:

                            variant     score      mins     means      maxs
0                           base ()  1.000000  1.000000  1.000000  1.000000
1  bif_-12.tpb_256.pref_2.vsp2_2 ()  0.999754  0.984375  0.999780  1.023256
2   bif_-4.tpb_768.pref_2.vsp2_6 ()  0.997953  0.753906  0.997823  1.166667
3    bif_0.tpb_384.pref_1.vsp2_5 ()  0.996448  0.753906  0.996489  1.166667
4   bif_12.tpb_384.pref_3.vsp2_4 ()  0.996448  0.753906  0.996489  1.166667
5   bif_16.tpb_640.pref_3.vsp2_1 ()  0.996416  0.753906  0.996455  1.166667
6    bif_8.tpb_896.pref_1.vsp2_3 ()  0.977935  0.753906  0.978535  1.023256

Yet, the scores reported to CompileIQ are (ordered by variant as the list above):

Evaluating variant {'bif': -12, 'pref': 2, 'tpb': 256, 'vsp2': 2}: 0.7747535944721688
Evaluating variant {'bif': -4, 'pref': 2, 'tpb': 768, 'vsp2': 6}: 0.7833624486057084
Evaluating variant {'bif': 0, 'pref': 1, 'tpb': 384, 'vsp2': 5}: 0.7720734255442793
Evaluating variant {'bif': 12, 'pref': 3, 'tpb': 384, 'vsp2': 4}: 0.7722660680432303
Evaluating variant {'bif': 16, 'pref': 3, 'tpb': 640, 'vsp2': 1}: 0.7722450212177555
Evaluating variant {'bif': 8, 'pref': 1, 'tpb': 896, 'vsp2': 3}: 0.7580094070089615

analyze.py reports the score monotonically decreasing (highest score first). But the same order of benchmarked variants does neither monotonically increase or decrease, suggesting that the analysis score is not isomorphic to the CompileIQ score. This is either a bug in the score computation or beyond my understanding of the tuning framework.

@gevtushenko as the author of the tuning framework, I kindly ask for an explanation for this observation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant