-
Notifications
You must be signed in to change notification settings - Fork 706
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Fix CP A2A F16 when NVTE_FP8_DPA_BWD=1
2.15.0
#2917
opened Apr 22, 2026 by
cyanguwa
Collaborator
Loading…
8 of 13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916
opened Apr 22, 2026 by
sudhakarsingh27
Collaborator
•
Draft
1 of 3 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2911
opened Apr 21, 2026 by
NoonePauseferg
Loading…
[PyTorch] Fix FA4 selection when FA3 is unavailable.
2.15.0
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2909
opened Apr 21, 2026 by
bbuschkaemper
Contributor
Loading…
8 of 13 tasks
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2907
opened Apr 21, 2026 by
jing-4369
Loading…
3 of 4 tasks
Add head dim 256 support for SDPA on Blackwell
#2906
opened Apr 21, 2026 by
yaox12
Member
Loading…
1 of 13 tasks
[PyTorch] Expose function to bulk-allocate tensors backed by the same buffer
#2900
opened Apr 18, 2026 by
timmoon10
Collaborator
Loading…
9 of 13 tasks
add support for enabling cuda graph under thd format in megatron.
#2898
opened Apr 17, 2026 by
HaochenYuan
Loading…
13 tasks
Improve the dimension checks for the FP8 recipes
#2894
opened Apr 16, 2026 by
ptrendx
Member
Loading…
13 tasks
[Debug] Add AutoswitchGEmm for Debug Precision Tool
#2883
opened Apr 15, 2026 by
shangxiaokang
•
Draft
3 of 13 tasks
[PyTorch] Split TE ops op_forward into op_forward and setup_context
#2877
opened Apr 14, 2026 by
pggPL
Collaborator
Loading…
5 of 7 tasks
[DONOT MERGE] Wgrad cute dsl v2
#2872
opened Apr 13, 2026 by
vthumbe1503
Collaborator
•
Draft
13 tasks
[JAX] Add debug validation mode for runtime group size alignment
#2867
opened Apr 11, 2026 by
jberchtold-nvidia
Collaborator
•
Draft
13 tasks
Optimizations for MXFP8/NVFP4 dequantize kernels
#2865
opened Apr 10, 2026 by
YigongQin
Loading…
8 of 13 tasks
Adds GEMM Profiling Guide to TE
#2863
opened Apr 9, 2026 by
jomitchellnv
Contributor
Loading…
7 tasks
Add cpplint and ruff linter to pre-commit and fix lint violations
#2853
opened Apr 8, 2026 by
pstjohn
Contributor
Loading…
Bump transformers from 4.55.0 to 5.0.0rc3 in /docs/examples/te_gemma
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#2851
opened Apr 8, 2026 by
dependabot
Bot
Loading…
Bump transformers from 4.57.0 to 5.0.0rc3 in /docs/examples/te_llama
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#2850
opened Apr 8, 2026 by
dependabot
Bot
Loading…
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.