cudax::copy(mdspan) Optimize shared memory cases#9137
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughSummary by CodeRabbit
suggestion: WalkthroughRefactors tensor coordinate iteration, adds source/destination-aware shared-memory tiling with optional XOR swizzle and permuted load/store paths, adjusts mdspan dispatch and an optimized-kernel early-return, generalizes benchmarks to templated index/data types with size_t offsets, and expands shared-memory transpose tests. ChangesShared-memory tiling and tensor copy optimization
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c248b761-c5a1-4a50-9627-641679ac2294
📒 Files selected for processing (9)
cudax/benchmarks/CMakeLists.txtcudax/benchmarks/bench/copy/copy_bench.cucudax/include/cuda/experimental/__copy/copy_optimized.cuhcudax/include/cuda/experimental/__copy/copy_shared_memory.cuhcudax/include/cuda/experimental/__copy/copy_shared_memory_utils.cuhcudax/include/cuda/experimental/__copy/mdspan_d2d.cuhcudax/include/cuda/experimental/__copy/tensor_copy_utils.cuhcudax/include/cuda/experimental/__copy/tensor_iterator.cuhcudax/include/cuda/experimental/__copy_bytes/tensor_query.cuh
💤 Files with no reviewable changes (1)
- cudax/include/cuda/experimental/__copy/tensor_copy_utils.cuh
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 30m 59s: Pass: 100%/55 | Total: 9h 28m | Max: 30m 59s | Hits: 83%/38622See results here. |
Description
The PR optimizes
cudax::copy(mdspan)for the following cases:fast_mod_divcode to use less registers.