Skip to content

cuFFTDx 2d FFT explanation of logical remapping for shared memory based global memory to register memory IO #307

@Coll1ns-cult

Description

@Coll1ns-cult

Hi,

I am trying to understand the thread remapping logic in the shared memory variants of load_strided / store_strided in block_io_generic_strided.hpp of mathdx\25.12\example\cufftdx\05_fft_Xd folder.

My current understanding of the natural thread assignment (used in the non-shared-memory variants and during FFT computation for 1D FFT in introduction_example.cu) is:

  • threadIdx.x → thread's position within its FFT
  • threadIdx.ylocal_fft_id
  • blockIdx.x → block's batch offset

However, in the shared memory variants, the code performs the following remapping before doing the global memory I/O:

const unsigned int tid          = threadIdx.x + FFT::working_group::block_dim().x * threadIdx.y;
const unsigned int tidx         = tid / FFT::working_group::block_dim().y;
const unsigned int tidy         = tid % FFT::working_group::block_dim().y;
unsigned int       smem_index   = tidx + tidy * FFT::working_group::block_dim().x;

I tried to understand it through with AI, however, I yet to succeed with a clear explanation of this logical mapping.
What I still don't fully understand is:

  1. What is the concrete motivation for performing this logical mapping at all? In other words, what problem does it solve compared to just using threadIdx.x / threadIdx.y directly for global memory indexing as in the non-smem variants?
  2. After the transpose, the smem index is computed as tidx + tidy * block_dim().x. Then after __syncthreads(), the read phase uses threadIdx.x + threadIdx.y * block_dim().x with the same formula. Could you walk through concretely what data each thread writes vs. reads, and why those two different identities (tidx/tidy vs threadIdx.x/threadIdx.y) indexing the same smem layout performs the intended transpose?
  3. Is my understanding of blockIdx.y correct — is it unused / always 0 in this kernel launch configuration, or does it carry meaning?

Thanks in advance for your time!

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions