Control unrolling in cub::DeviceMerge[Sort] via tuning #9181
Control unrolling in cub::DeviceMerge[Sort] via tuning #9181bernhardmgruber wants to merge 5 commits into
cub::DeviceMerge[Sort] via tuning #9181Conversation
c8d3e4f to
65a3115
Compare
| int NumThreads, | ||
| int ItemsPerThread, | ||
| typename SynchronizationPolicy, | ||
| bool _Unroll = true> |
There was a problem hiding this comment.
@miscco can I use this spelling to denote an "internal" parameter on a public API?
| // RAPIDS cuDF needs to avoid unrolling some loops in sort to prevent compile time issues | ||
| #if defined(CCCL_AVOID_SORT_UNROLL) | ||
| bool unroll = false; | ||
| #else // CCCL_AVOID_SORT_UNROLL | ||
| bool unroll = true; | ||
| #endif // CCCL_AVOID_SORT_UNROLL | ||
|
|
There was a problem hiding this comment.
Note: we should deprecate CCCL_AVOID_SORT_UNROLL at some point, but let's keep it for a while to ease the transition for cuDF.
📝 WalkthroughSummary by CodeRabbitRelease Notes
important: WalkthroughThis PR exposes loop-unrolling control as a boolean tuning parameter for DeviceMerge/DeviceMergeSort and threads it through policies, thread/block implementations, agent wiring, removes the old unroll macro, and adds tests for no-unroll mode. ChangesMerge Sort Unrolling Control
Assessment against linked issues
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cub/test/catch2_test_device_merge_sort_env.cu (1)
569-578: ⚡ Quick winsuggestion: switch this test to
C2H_TEST(like the other tuning tests in this section) so it exercises the launch-wrapper/env path consistently; keeping it asTEST_CASEreduces coverage for regressions in tuned dispatch wiring.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 566756bd-cfec-4d90-b2b8-0e7f6cd77579
📒 Files selected for processing (6)
cub/cub/agent/agent_merge_sort.cuhcub/cub/block/block_merge_sort.cuhcub/cub/device/dispatch/tuning/tuning_merge_sort.cuhcub/cub/thread/thread_sort.cuhcub/cub/util_macro.cuhcub/test/catch2_test_device_merge_sort_env.cu
💤 Files with no reviewable changes (1)
- cub/cub/util_macro.cuh
This comment has been minimized.
This comment has been minimized.
0fc126b to
cfbe940
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (3)
cub/cub/block/block_merge_sort.cuh (1)
110-111: 💤 Low valuesuggestion: Per coding guidelines, free function calls should be fully qualified from the global namespace. Here and at lines 461, 518, the
detail::calls should use::cub::detail::.cub/cub/agent/agent_merge_sort.cuh (1)
558-566: 💤 Low valuesuggestion: Per coding guidelines, this should be
::cub::detail::serial_merge<policy.unroll>for full qualification.cub/cub/agent/agent_merge.cuh (1)
226-234: 💤 Low valuesuggestion: Per coding guidelines, use
::cub::detail::serial_merge<Unroll>with leading::.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d1776c12-fedb-49e8-ad1b-d6af8d9dd8bf
📒 Files selected for processing (10)
cub/cub/agent/agent_merge.cuhcub/cub/agent/agent_merge_sort.cuhcub/cub/block/block_merge_sort.cuhcub/cub/device/dispatch/dispatch_merge.cuhcub/cub/device/dispatch/tuning/tuning_merge.cuhcub/cub/device/dispatch/tuning/tuning_merge_sort.cuhcub/cub/thread/thread_sort.cuhcub/cub/util_macro.cuhcub/test/catch2_test_device_merge_env.cucub/test/catch2_test_device_merge_sort_env.cu
💤 Files with no reviewable changes (1)
- cub/cub/util_macro.cuh
✅ Files skipped from review due to trivial changes (1)
- cub/cub/device/dispatch/dispatch_merge.cuh
🚧 Files skipped from review as they are similar to previous changes (3)
- cub/test/catch2_test_device_merge_sort_env.cu
- cub/cub/thread/thread_sort.cuh
- cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh
cub::DeviceMergeSort via tuning cub::DeviceMerge[Sort] via tuning
😬 CI Workflow Results🟥 Finished in 2h 43m: Pass: 97%/285 | Total: 16d 04h | Max: 2h 33m | Hits: 12%/1261524See results here. |
Fixes: #9052