Skip to content

Control unrolling in cub::DeviceMerge[Sort] via tuning #9181

Open
bernhardmgruber wants to merge 5 commits into
NVIDIA:mainfrom
bernhardmgruber:merge_sort_unroll
Open

Control unrolling in cub::DeviceMerge[Sort] via tuning #9181
bernhardmgruber wants to merge 5 commits into
NVIDIA:mainfrom
bernhardmgruber:merge_sort_unroll

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

Fixes: #9052

@bernhardmgruber bernhardmgruber requested a review from a team as a code owner May 29, 2026 09:58
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 29, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 29, 2026
int NumThreads,
int ItemsPerThread,
typename SynchronizationPolicy,
bool _Unroll = true>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco can I use this spelling to denote an "internal" parameter on a public API?

Comment on lines +98 to +104
// RAPIDS cuDF needs to avoid unrolling some loops in sort to prevent compile time issues
#if defined(CCCL_AVOID_SORT_UNROLL)
bool unroll = false;
#else // CCCL_AVOID_SORT_UNROLL
bool unroll = true;
#endif // CCCL_AVOID_SORT_UNROLL

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we should deprecate CCCL_AVOID_SORT_UNROLL at some point, but let's keep it for a while to ease the transition for cuDF.

Comment thread cub/cub/thread/thread_sort.cuh
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Added configurable loop unrolling control for merge sort and merge operations, enabling fine-grained performance tuning.
  • Tests

    • Added comprehensive test coverage validating merge and merge sort functionality with loop unrolling disabled.

important:

Walkthrough

This PR exposes loop-unrolling control as a boolean tuning parameter for DeviceMerge/DeviceMergeSort and threads it through policies, thread/block implementations, agent wiring, removes the old unroll macro, and adds tests for no-unroll mode.

Changes

Merge Sort Unrolling Control

Layer / File(s) Summary
Tuning policy unroll member
cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh, cub/cub/device/dispatch/tuning/tuning_merge.cuh
merge_sort_policy and merge_policy gain unroll members with defaults; equality and stream output operators updated.
Thread-level odd-even sort helper
cub/cub/thread/thread_sort.cuh
Internal detail::stable_odd_even_sort<Unroll> template extracts stable odd-even sorting; public StableOddEvenSort delegates to it.
Block merge sort unroll parameter and helpers
cub/cub/block/block_merge_sort.cuh
BlockMergeSortStrategy and BlockMergeSort gain _Unroll template parameter; added detail::serial_merge<Unroll>; loop pragmas and internal calls (stable sort, merge, last-tile fill) use _Unroll.
Agent dispatch wiring
cub/cub/agent/agent_merge.cuh, cub/cub/agent/agent_merge_sort.cuh
Agent templates gain Unroll parameter and agent/block instantiations / serial-merge calls thread policy.unroll into detail::serial_merge<...>.
Remove deprecated macro
cub/cub/util_macro.cuh
Removed _CCCL_SORT_MAYBE_UNROLL() macro.
Dispatch formatting changes
cub/cub/device/dispatch/dispatch_merge.cuh
Template alias line-wrapping/indentation reformatted; no semantic changes.
Test no-unroll tuning
cub/test/catch2_test_device_merge_env.cu, cub/test/catch2_test_device_merge_sort_env.cu
Added no_unroll_tuning callable and device tests validating DeviceMerge and DeviceMergeSort behave correctly with unrolling disabled.

Assessment against linked issues

Objective Addressed Explanation
Add unroll flag to merge sort tuning policy [#9052]
Replace CCCL_AVOID_SORT_UNROLL with tuning control [#9052]
Provide standard API interface for unroll control [#9052]

Suggested reviewers

  • srinivasyadav18
  • oleksandr-pavlyk
  • elstehle

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cub/test/catch2_test_device_merge_sort_env.cu (1)

569-578: ⚡ Quick win

suggestion: switch this test to C2H_TEST (like the other tuning tests in this section) so it exercises the launch-wrapper/env path consistently; keeping it as TEST_CASE reduces coverage for regressions in tuned dispatch wiring.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 566756bd-cfec-4d90-b2b8-0e7f6cd77579

📥 Commits

Reviewing files that changed from the base of the PR and between 2a1fb3b and b35115f.

📒 Files selected for processing (6)
  • cub/cub/agent/agent_merge_sort.cuh
  • cub/cub/block/block_merge_sort.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh
  • cub/cub/thread/thread_sort.cuh
  • cub/cub/util_macro.cuh
  • cub/test/catch2_test_device_merge_sort_env.cu
💤 Files with no reviewable changes (1)
  • cub/cub/util_macro.cuh

Comment thread cub/cub/block/block_merge_sort.cuh Outdated
Comment thread cub/cub/block/block_merge_sort.cuh
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
cub/cub/block/block_merge_sort.cuh (1)

110-111: 💤 Low value

suggestion: Per coding guidelines, free function calls should be fully qualified from the global namespace. Here and at lines 461, 518, the detail:: calls should use ::cub::detail::.

cub/cub/agent/agent_merge_sort.cuh (1)

558-566: 💤 Low value

suggestion: Per coding guidelines, this should be ::cub::detail::serial_merge<policy.unroll> for full qualification.

cub/cub/agent/agent_merge.cuh (1)

226-234: 💤 Low value

suggestion: Per coding guidelines, use ::cub::detail::serial_merge<Unroll> with leading ::.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d1776c12-fedb-49e8-ad1b-d6af8d9dd8bf

📥 Commits

Reviewing files that changed from the base of the PR and between 65a3115 and cfbe940.

📒 Files selected for processing (10)
  • cub/cub/agent/agent_merge.cuh
  • cub/cub/agent/agent_merge_sort.cuh
  • cub/cub/block/block_merge_sort.cuh
  • cub/cub/device/dispatch/dispatch_merge.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh
  • cub/cub/thread/thread_sort.cuh
  • cub/cub/util_macro.cuh
  • cub/test/catch2_test_device_merge_env.cu
  • cub/test/catch2_test_device_merge_sort_env.cu
💤 Files with no reviewable changes (1)
  • cub/cub/util_macro.cuh
✅ Files skipped from review due to trivial changes (1)
  • cub/cub/device/dispatch/dispatch_merge.cuh
🚧 Files skipped from review as they are similar to previous changes (3)
  • cub/test/catch2_test_device_merge_sort_env.cu
  • cub/cub/thread/thread_sort.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh

@bernhardmgruber bernhardmgruber changed the title Control unrolling in cub::DeviceMergeSort via tuning Control unrolling in cub::DeviceMerge[Sort] via tuning May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 2h 43m: Pass: 97%/285 | Total: 16d 04h | Max: 2h 33m | Hits: 12%/1261524

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Expose unrolling control in cub::DeviceMergeSort to tuning

1 participant