[cudax] Make lane_mask part of the mapping result#9140
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (13)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (9)
📝 WalkthroughSummary by CodeRabbit
important: WalkthroughThis PR threads a ::cuda::device::lane_mask through group mapping results: the concept requires lane_mask(), __mapping_result stores and exposes a lane mask, mapping algorithms compute updated masks, synchronizers read masks at sync time, and tests validate mask propagation. ChangesLane Mask Support in Group Mappings
Suggested labels
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 359ff862-0f7a-40b2-8f54-2077d3eadc7b
📒 Files selected for processing (13)
cudax/include/cuda/experimental/__group/concepts.cuhcudax/include/cuda/experimental/__group/group.cuhcudax/include/cuda/experimental/__group/mapping/group_as.cuhcudax/include/cuda/experimental/__group/mapping/group_by.cuhcudax/include/cuda/experimental/__group/mapping/mapping_result.cuhcudax/include/cuda/experimental/__group/synchronizer/lane_synchronizer.cuhcudax/include/cuda/experimental/__group/this_group.cuhcudax/test/common/group_testing.cuhcudax/test/group/mapping/composite_mapping.cucudax/test/group/mapping/group_as.cucudax/test/group/mapping/group_by.cucudax/test/group/mapping/identity_mapping.cucudax/test/group/synchronizer/lane_synchronizer.cu
💤 Files with no reviewable changes (1)
- cudax/test/group/synchronizer/lane_synchronizer.cu
This comment has been minimized.
This comment has been minimized.
lane_mask be part of the mapping resultlane_mask part of the mapping result
There was a problem hiding this comment.
🧹 Nitpick comments (2)
cudax/include/cuda/experimental/__group/synchronizer/lane_synchronizer.cuh (2)
28-28: 💤 Low valuesuggestion: Drop the unused
<cuda/std/__fwd/span.h>include (only occurrence at line 28) and consider removing__is_supported_count(defined at lines 44-47, with no call sites in this file).
33-33: 💤 Low valuesuggestion: Drop the
#include <cuda/experimental/__group/mapping/group_by.cuh>inlane_synchronizer.cuh(line 33) if it isn’t needed for anything besides_MappingResult’s__group_mapping_resultconcept check.group_bydoesn’t appear elsewhere in the header, and__group_mapping_resultis defined in__group/concepts.cuhwithout referencinggroup_by.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b9b3ceef-4108-4562-a135-604dc84ff881
📒 Files selected for processing (2)
cudax/include/cuda/experimental/__group/synchronizer/lane_synchronizer.cuhcudax/test/group/mapping/group_by.cu
🚧 Files skipped from review as they are similar to previous changes (1)
- cudax/test/group/mapping/group_by.cu
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
f4da133 to
3408197
Compare
🥳 CI Workflow Results🟩 Finished in 33m 02s: Pass: 100%/55 | Total: 16h 15m | Max: 32m 58s | Hits: 31%/98019See results here. |
Storing lane mask only inside
lane_synchronizerinstance had several problems:barrier_synchronizer, there was no way to create a sub-group that would uselane_synchronizer, because we couldn't retrieve the lane mask.lane_synchronzierinstance, which made us duplicate the operations.I solved both problems by moving the lane mask into the mapping result. Now, just from the mapping result, the threads are aware which threads from the same group are inside the same warp. I believe it will be useful for some other collective operations even when using
barrier_synchronizer