Skip to content

[cudax] Make lane_mask part of the mapping result#9140

Merged
davebayer merged 1 commit into
NVIDIA:mainfrom
davebayer:groups_lane_mask_inside_mapping_result
Jun 1, 2026
Merged

[cudax] Make lane_mask part of the mapping result#9140
davebayer merged 1 commit into
NVIDIA:mainfrom
davebayer:groups_lane_mask_inside_mapping_result

Conversation

@davebayer
Copy link
Copy Markdown
Contributor

Storing lane mask only inside lane_synchronizer instance had several problems:

  1. If threads were grouped and synchronized using barrier_synchronizer, there was no way to create a sub-group that would use lane_synchronizer, because we couldn't retrieve the lane mask.
  2. There was no good way to pass lane mask computed inside the mapping to the lane_synchronzier instance, which made us duplicate the operations.

I solved both problems by moving the lane mask into the mapping result. Now, just from the mapping result, the threads are aware which threads from the same group are inside the same warp. I believe it will be useful for some other collective operations even when using barrier_synchronizer

@davebayer davebayer requested a review from a team as a code owner May 27, 2026 16:12
@davebayer davebayer requested a review from caugonnet May 27, 2026 16:12
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 27, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 27, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 46ee23cf-63c1-46f3-ad04-6caba047c1cd

📥 Commits

Reviewing files that changed from the base of the PR and between f4da133 and 6b39d07.

📒 Files selected for processing (13)
  • cudax/include/cuda/experimental/__group/concepts.cuh
  • cudax/include/cuda/experimental/__group/group.cuh
  • cudax/include/cuda/experimental/__group/mapping/group_as.cuh
  • cudax/include/cuda/experimental/__group/mapping/group_by.cuh
  • cudax/include/cuda/experimental/__group/mapping/mapping_result.cuh
  • cudax/include/cuda/experimental/__group/synchronizer/lane_synchronizer.cuh
  • cudax/include/cuda/experimental/__group/this_group.cuh
  • cudax/test/common/group_testing.cuh
  • cudax/test/group/mapping/composite_mapping.cu
  • cudax/test/group/mapping/group_as.cu
  • cudax/test/group/mapping/group_by.cu
  • cudax/test/group/mapping/identity_mapping.cu
  • cudax/test/group/synchronizer/lane_synchronizer.cu
💤 Files with no reviewable changes (1)
  • cudax/test/group/synchronizer/lane_synchronizer.cu
🚧 Files skipped from review as they are similar to previous changes (9)
  • cudax/include/cuda/experimental/__group/mapping/group_as.cuh
  • cudax/test/group/mapping/group_by.cu
  • cudax/test/common/group_testing.cuh
  • cudax/include/cuda/experimental/__group/concepts.cuh
  • cudax/include/cuda/experimental/__group/this_group.cuh
  • cudax/include/cuda/experimental/__group/mapping/group_by.cuh
  • cudax/test/group/mapping/group_as.cu
  • cudax/include/cuda/experimental/__group/mapping/mapping_result.cuh
  • cudax/include/cuda/experimental/__group/synchronizer/lane_synchronizer.cuh

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added lane-mask tracking and propagation across group mappings and group-local queries for more precise lane-aware behavior.
  • Bug Fixes

    • Fixed invalid-case return behavior in synchronizer creation.
    • Added runtime checks asserting lane-mask and population-count consistency for mapping results.
  • Tests

    • Expanded tests to validate lane-mask semantics and updated expectations accordingly.

important:

Walkthrough

This PR threads a ::cuda::device::lane_mask through group mapping results: the concept requires lane_mask(), __mapping_result stores and exposes a lane mask, mapping algorithms compute updated masks, synchronizers read masks at sync time, and tests validate mask propagation.

Changes

Lane Mask Support in Group Mappings

Layer / File(s) Summary
Concept contract and core data structure
cudax/include/cuda/experimental/__group/concepts.cuh, cudax/include/cuda/experimental/__group/mapping/mapping_result.cuh
Extends __group_mapping_result concept to require lane_mask(); adds stored ::cuda::device::lane_mask __lane_mask_ and lane_mask() accessor to __mapping_result; updates invalid() factories to initialize lane mask to ::cuda::device::lane_mask::none(); adds __make_lane_mask_for_n() helper.
Group hierarchy and initial mapping
cudax/include/cuda/experimental/__group/this_group.cuh, cudax/include/cuda/experimental/__group/group.cuh
Parameterizes __this_mapping_result by _Level and implements level-aware lane_mask() (this_lane() for thread level, all() otherwise); __get_initial_mapping_result forwards parent lane_mask(); __do_mapping adds assertions on lane_mask membership and popcount; fixes early-return type in __make_synchronizer_instance.
Group mapping operations
cudax/include/cuda/experimental/__group/mapping/group_as.cuh, cudax/include/cuda/experimental/__group/mapping/group_by.cuh
group_as and group_by now compute intermediates (__group_rank, __n, __rank) and derive __lane_mask via __make_lane_mask_for_n() from previous mapping, returning _MappingResult that includes the computed lane mask for exhaustive and non-exhaustive paths.
Synchronizer refactoring
cudax/include/cuda/experimental/__group/synchronizer/lane_synchronizer.cuh
Removes stored lane-mask in __synchronizer_instance; do_sync/do_sync_aligned call ::__syncwarp() with __mapping_result.lane_mask().value(); make_instance generalized to accept any mapping type and asserts popcount equals mapping count; invalid() returns an empty instance.
Test infrastructure and lane mask validation
cudax/test/common/group_testing.cuh, cudax/test/group/mapping/*.cu, cudax/test/group/synchronizer/lane_synchronizer.cu
Adds <cuda/warp> includes; implements ThreadsInWarpMappingResult::lane_mask() returning lane_mask::all(); updates mapping tests to compute expected lane masks and assert result.lane_mask() equality; removes old internal-mask checks in synchronizer test.

Suggested labels

libcu++

Suggested reviewers

  • wmaxey
  • miscco

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 359ff862-0f7a-40b2-8f54-2077d3eadc7b

📥 Commits

Reviewing files that changed from the base of the PR and between 3d451a3 and 6a7d605.

📒 Files selected for processing (13)
  • cudax/include/cuda/experimental/__group/concepts.cuh
  • cudax/include/cuda/experimental/__group/group.cuh
  • cudax/include/cuda/experimental/__group/mapping/group_as.cuh
  • cudax/include/cuda/experimental/__group/mapping/group_by.cuh
  • cudax/include/cuda/experimental/__group/mapping/mapping_result.cuh
  • cudax/include/cuda/experimental/__group/synchronizer/lane_synchronizer.cuh
  • cudax/include/cuda/experimental/__group/this_group.cuh
  • cudax/test/common/group_testing.cuh
  • cudax/test/group/mapping/composite_mapping.cu
  • cudax/test/group/mapping/group_as.cu
  • cudax/test/group/mapping/group_by.cu
  • cudax/test/group/mapping/identity_mapping.cu
  • cudax/test/group/synchronizer/lane_synchronizer.cu
💤 Files with no reviewable changes (1)
  • cudax/test/group/synchronizer/lane_synchronizer.cu

Comment thread cudax/include/cuda/experimental/__group/group.cuh
Comment thread cudax/test/group/mapping/group_by.cu Outdated
@github-actions

This comment has been minimized.

@davebayer davebayer changed the title [cudax] Make lane_mask be part of the mapping result [cudax] Make lane_mask part of the mapping result May 27, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
cudax/include/cuda/experimental/__group/synchronizer/lane_synchronizer.cuh (2)

28-28: 💤 Low value

suggestion: Drop the unused <cuda/std/__fwd/span.h> include (only occurrence at line 28) and consider removing __is_supported_count (defined at lines 44-47, with no call sites in this file).


33-33: 💤 Low value

suggestion: Drop the #include <cuda/experimental/__group/mapping/group_by.cuh> in lane_synchronizer.cuh (line 33) if it isn’t needed for anything besides _MappingResult’s __group_mapping_result concept check. group_by doesn’t appear elsewhere in the header, and __group_mapping_result is defined in __group/concepts.cuh without referencing group_by.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b9b3ceef-4108-4562-a135-604dc84ff881

📥 Commits

Reviewing files that changed from the base of the PR and between 6a7d605 and f4da133.

📒 Files selected for processing (2)
  • cudax/include/cuda/experimental/__group/synchronizer/lane_synchronizer.cuh
  • cudax/test/group/mapping/group_by.cu
🚧 Files skipped from review as they are similar to previous changes (1)
  • cudax/test/group/mapping/group_by.cu

@github-actions

This comment has been minimized.

@davebayer davebayer requested review from pciolkosz and removed request for caugonnet May 27, 2026 18:21
@github-actions

This comment has been minimized.

@davebayer davebayer closed this May 31, 2026
@davebayer davebayer force-pushed the groups_lane_mask_inside_mapping_result branch from f4da133 to 3408197 Compare May 31, 2026 19:14
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL May 31, 2026
@davebayer davebayer reopened this Jun 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

🥳 CI Workflow Results

🟩 Finished in 33m 02s: Pass: 100%/55 | Total: 16h 15m | Max: 32m 58s | Hits: 31%/98019

See results here.

@davebayer davebayer merged commit 8f8fb49 into NVIDIA:main Jun 1, 2026
76 of 77 checks passed
@davebayer davebayer deleted the groups_lane_mask_inside_mapping_result branch June 1, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants