You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace pybind11 extension with PyTorch stable ABI
Implement a stable ABI layer that replaces the pybind11-based C++ extension
with torch::Library-registered operations using torch::stable::Tensor. This
allows the PyTorch extension to be built once and work across multiple
Python/PyTorch versions without recompilation.
Key changes:
- Add csrc/extensions/ with stable ABI C++ implementations for all TE ops
(activation, attention, cast, gemm, normalization, etc.)
- Add _stable_torch_module.py as the Python-side module replacing pybind11
- Add _stable_ops.py and _tex.py shims for backward compatibility
- Add tensor extraction and stable quantization utilities
- Add quantize_bidirectional for fused rowwise+columnwise quantization
- Update build system to compile the stable extension separately
- Add .gitignore for build-time artifact directories
- Fix MXFP8 scale swizzle, columnwise data, and on-the-fly creation
- Fix NVFP4 bidirectional quantization for correct columnwise data
- Fix FP8 CurrentScaling stale amax/scale between quantization runs
- Fix distributed amax all-reduce for NVFP4 and FP8 current scaling
- Clean up pylint issues in new files
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
0 commit comments