Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[PyTorch] Fix CP A2A F16 when NVTE_FP8_DPA_BWD=1 2.15.0
#2917 opened Apr 22, 2026 by cyanguwa Collaborator Loading…
8 of 13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916 opened Apr 22, 2026 by sudhakarsingh27 Collaborator Draft
1 of 3 tasks
Variable Grouped Swizzle
#2914 opened Apr 22, 2026 by int-smart Contributor Loading…
8 of 13 tasks
NVFP4 per-token recipe
#2913 opened Apr 21, 2026 by YigongQin Draft
1 of 13 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2911 opened Apr 21, 2026 by NoonePauseferg Loading…
[PyTorch] Fix FA4 selection when FA3 is unavailable. 2.15.0 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2909 opened Apr 21, 2026 by bbuschkaemper Contributor Loading…
8 of 13 tasks
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2907 opened Apr 21, 2026 by jing-4369 Loading…
3 of 4 tasks
Add head dim 256 support for SDPA on Blackwell
#2906 opened Apr 21, 2026 by yaox12 Member Loading…
1 of 13 tasks
[PyTorch] Expose function to bulk-allocate tensors backed by the same buffer
#2900 opened Apr 18, 2026 by timmoon10 Collaborator Loading…
9 of 13 tasks
Improve the dimension checks for the FP8 recipes
#2894 opened Apr 16, 2026 by ptrendx Member Loading…
13 tasks
Add optimised top-k kernel AIR.
#2890 opened Apr 16, 2026 by dcampora Loading…
8 of 13 tasks
Add AI written qwen3_moe example
#2887 opened Apr 15, 2026 by skyw Loading…
4 of 13 tasks
[Debug] Add AutoswitchGEmm for Debug Precision Tool
#2883 opened Apr 15, 2026 by shangxiaokang Draft
3 of 13 tasks
SMEM offset caching RHT
#2882 opened Apr 15, 2026 by sraman-rgb Loading…
13 tasks
[PyTorch] Split TE ops op_forward into op_forward and setup_context
#2877 opened Apr 14, 2026 by pggPL Collaborator Loading…
5 of 7 tasks
[DONOT MERGE] Wgrad cute dsl v2
#2872 opened Apr 13, 2026 by vthumbe1503 Collaborator Draft
13 tasks
Optimizations for MXFP8/NVFP4 dequantize kernels
#2865 opened Apr 10, 2026 by YigongQin Loading…
8 of 13 tasks
Adds GEMM Profiling Guide to TE
#2863 opened Apr 9, 2026 by jomitchellnv Contributor Loading…
7 tasks
[DO NOT MERGE] Test CI
#2862 opened Apr 9, 2026 by cyanguwa Collaborator Draft
13 tasks
Add cpplint and ruff linter to pre-commit and fix lint violations
#2853 opened Apr 8, 2026 by pstjohn Contributor Loading…
Bump transformers from 4.55.0 to 5.0.0rc3 in /docs/examples/te_gemma dependencies Pull requests that update a dependency file python Pull requests that update python code
#2851 opened Apr 8, 2026 by dependabot Bot Loading…
Bump transformers from 4.57.0 to 5.0.0rc3 in /docs/examples/te_llama dependencies Pull requests that update a dependency file python Pull requests that update python code
#2850 opened Apr 8, 2026 by dependabot Bot Loading…
ProTip! Type g i on any issue or pull request to go back to the issue listing page.