[Feature][OP] Add V100 (SM70) GPU Support#6306
Open
mattheliu wants to merge 12 commits intoPaddlePaddle:developfrom
Open
[Feature][OP] Add V100 (SM70) GPU Support#6306mattheliu wants to merge 12 commits intoPaddlePaddle:developfrom
mattheliu wants to merge 12 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Support FP16 inference on V100 by adding SM70 compilation flags, disabling BF16/FP8 quantization, and graceful fallback for SM80+ only ops.
aa40210 to
f3216c0
Compare
4a1f4d4 to
3b39080
Compare
Resolve merge conflicts: - custom_ops/gpu_ops/cpp_extensions.cc: Keep both MaskedPerTokenQuant and FusedMaskSwigluFP8Quant - fused_moe_deepgemm_backend.py: Keep _fp8_quant_blockwise_compat for V100 compatibility - block_wise_fp8.py: Keep SM70/V100 compatibility checks for FP8 support Co-Authored-By: Claude (Claude Opus 4.5) <[email protected]>
5cb9172 to
b9dcf58
Compare
The per_token_quant_fp8.cu file was removed and its functionality was moved to quantization/common.cu. Remove the stale reference from setup_ops.py to fix CI build failure. Co-Authored-By: Claude (Claude Opus 4.5) <[email protected]>
…gluFP8Quant - Remove PerTokenQuantPadding and MaskedPerTokenQuant declarations which have no implementation - Restore fused_mask_swiglu_fp8_quant pybind registration (was incorrectly changed to masked_per_token_quant) - Fix parameter name from recv_expert_count to token_nums_per_expert to match upstream This fixes the CI build error: undefined symbol _Z19MaskedPerTokenQuantRN6paddle6TensorES1_ib Co-Authored-By: Claude (Claude Opus 4.5) <[email protected]>
The FusedMaskSwigluFP8Quant function is declared in cpp_extensions.cc but its implementation file was missing from the build sources list, causing undefined symbol error during linking. This fixes CI build error: undefined symbol _Z23FusedMaskSwigluFP8QuantRN6paddle6TensorES1_ib Co-Authored-By: Claude (Claude Opus 4.5) <[email protected]>
The SetStop function is declared in metax_ops/cpp_extensions.cc but its implementation file (gpu_ops/set_stop.cu) was missing from the MetaX build sources list, causing undefined symbol error. This fixes MetaX CI build error: undefined symbol _Z7SetStopRN6paddle6TensorEb Co-Authored-By: Claude (Claude Opus 4.5) <[email protected]>
Collaborator
Author
|
/Re-run failed jobs |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #6306 +/- ##
==========================================
Coverage ? 67.53%
==========================================
Files ? 389
Lines ? 52098
Branches ? 8116
==========================================
Hits ? 35184
Misses ? 14344
Partials ? 2570
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
MetaX platform needs gelu_tanh op for model inference. This was accidentally removed in the SM70 compatibility changes. Co-Authored-By: Claude (Claude Opus 4.5) <[email protected]>
Resolved conflicts: - fused_moe_deepgemm_backend.py: Keep upstream FD_USE_PHI_FP8_QUANT logic, use _fp8_quant_blockwise_compat for phi branch compatibility - block_wise_fp8.py: Same approach - upstream logic with compat wrapper Co-Authored-By: Claude (Claude Opus 4.5) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
为 FastDeploy 添加 NVIDIA V100 GPU (SM70 架构) 支持,使其能在旧版 GPU 上进行开发测试。由于 V100 不支持以下特性,需要同时适配编译系统和运行时逻辑:
Modifications
编译系统
setup_ops.py: 支持 SM70+ 编译,分离 SM70/SM80+ 特有代码cpp_extensions.cc: 添加ENABLE_APPEND_ATTENTION和ENABLE_BF16宏控制条件编译CUDA Kernel
gelu_tanh.cu: 修复tanh.approx.f32PTX 指令在 SM70 的编译问题moe_wna16_marlin_*.cu/h: 修复 Marlin GEMM 模板在 SM70 的编译兼容性Python 运行时层
fastdeploy/platforms/cuda.py:get_sm_version())supports_bf16(),supports_fp8(),supports_async_copy(),supports_marlin())fastdeploy/config.py: BF16→FP16 dtype 自动降级fastdeploy/model_executor/layers/moe/moe.py:fastdeploy/model_executor/layers/quantization/__init__.py:block_wise_fp8→wint8,w4afp8→wint4)fastdeploy/model_executor/layers/quantization/mix_quant.py:fastdeploy/model_executor/layers/quantization/weight_only.py:fastdeploy/model_executor/layers/quantization/block_wise_fp8.py:attention/ops/*.py: 为 SM80+ 专属 ops 添加 try-except 保护测试
tests/layers/test_attention_layer.py: 添加 FP8 SM89+ skip 装饰器tests/layers/test_fusedmoe.py: 添加 FP8 SM89+ skip 装饰器tests/quantization/test_w4afp8.py: 添加 FP8 SM89+ skip 装饰器tests/layers/test_ffn.py: 根据 SM 版本自动选择 dtype 和量化配置SM70 Fallback 策略总览
Usage or Command
Accuracy Tests
V100 (SM70) 上测试结果:
所有 FP8 相关测试在 V100 上正确跳过(显示
SKIPPED (FP8 ops require SM89+)),非 FP8 功能全部通过。Checklist
pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.