Pass input_output_alias to TritonAutotunedKernelCall#2814
Pass input_output_alias to TritonAutotunedKernelCall#2814tdophung wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: JAX Toolbox <jax@nvidia.com>
Greptile SummaryThis PR removes the workaround that was passing an empty Key changes:
Minor note: Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller as triton_call_lowering
participant Alias as alias computation (new)
participant ATCK as TritonAutotunedKernelCall (C++)
participant CUDA as CUDA runtime
Caller->>Alias: iterate input_output_aliases.items()
Alias-->>Caller: (input_idx, num_inputs+output_idx, size_bytes) tuples
Caller->>ATCK: TritonAutotunedKernelCall(name, kernel_calls, aliases_with_sizes)
loop for each autotuning config
ATCK->>CUDA: cudaMemcpy — save aliased input buffer (save phase)
ATCK->>CUDA: launch kernel config N
ATCK->>CUDA: record timing
ATCK->>CUDA: cudaMemcpy — restore original input buffer (restore phase)
end
ATCK-->>Caller: best config selected, correct input state preserved
Reviews (2): Last reviewed commit: "[pre-commit.ci] auto fixes from pre-comm..." | Re-trigger Greptile |
for more information, see https://pre-commit.ci
Description
https://nvbugspro.nvidia.com/bug/5810384
To remove the WAR that was put in place for this bug.
This should also serves as part 2 to WAR to the intermittent sort_chunks_by_index bug seen before in #2730
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: