support qkdim!=vdim#8023
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #8023 +/- ##
==========================================
Coverage ? 67.67%
==========================================
Files ? 471
Lines ? 66360
Branches ? 10217
==========================================
Hits ? 44912
Misses ? 18576
Partials ? 2872
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 Required任务 : 8/10 通过
2 失败详情🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)分析器: 通用分析(fallback)
关键日志:
修复建议:
关联变更: 🔴 Extracted partial CE model tasks to run in CI. / run_ce_cases — 未知(置信度: 低)分析省略/待后续深挖。 当前仅掌握该任务以 exit code 123 失败,尚未读取该 job 的详细日志,无法判断是 PR 问题、环境问题还是不稳定问题。 |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-11 02:07:41 Asia/Shanghai
📋 Review 摘要
PR 概述:为 attention/QKV 路径补充 qkdim != vdim 支持。
变更范围:AppendAttention backend、QKV/QKVG linear loader、Triton RoPE、线性层单测。
影响面 Tag:[OP] [KVCache] [Loader]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | fastdeploy/model_executor/layers/attention/append_attn_backend.py:324 |
v_head_dim != head_dim 会无条件执行 q/k RMSNorm,未启用 qk norm 的模型会把 None 传进 Triton kernel |
| 🟡 建议 | tests/model_executor/test_linear.py:291 |
新增测试仍使用 v_head_dim == head_dim,没有覆盖本 PR 的核心维度不等分支 |
📝 PR 规范检查
标题缺少官方 Tag,且 PR 描述的 Motivation / Modifications / Usage or Command / Accuracy Tests 仍未填写。
标题建议(可直接复制):
[OP] Support qkdim != vdim in attention and QKV loading
PR 描述建议(点击展开,可直接复制)
## Motivation
Support models whose Q/K head dimension differs from V head dimension (`qkdim != vdim`) in FastDeploy attention and QKV projection paths.
## Modifications
- Add `v_head_dim` to attention backend constructors and use it for value cache shape.
- Pass `model_config.v_head_dim` from GPU model runner to the selected attention backend.
- Update QKV/QKVG parallel linear output sizing and V shard placement to use `v_head_dim`.
## Usage or Command
N/A
## Accuracy Tests
N/A. Current PR does not include accuracy or regression test results.
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.|
|
||
| if getattr(layer, "only_do_attn", False): | ||
| if self.external_norm_rope: | ||
| qk_rmsnorm_fused( |
There was a problem hiding this comment.
这里把 external_norm_rope 只和 v_head_dim != head_dim 绑定后,会对所有 qk/v 维度不同的层执行 qk_rmsnorm_fused。但 Attention 默认 use_qk_norm=False,只有开启时才会创建 q_norm_weight / k_norm_weight;现有多处 Attention(...) 构造没有传 use_qk_norm。这些模型一旦配置 v_head_dim != head_dim,这里会把 None 作为 Triton 指针传入,qk_rmsnorm_fused_kernel 随后 tl.load(q_weight_ptr + ...) 会直接失败。建议把“需要外部 rope/write_cache”和“需要 q/k norm”拆开:只有权重存在或 layer.use_qk_norm 为真时才跑 fused norm;没有 q/k norm 的模型仍应只做 RoPE/write_cache。
| kv_num_heads_per_rank=1, | ||
| num_kv_head_replicas=2, | ||
| head_dim=2, | ||
| v_head_dim=2, |
There was a problem hiding this comment.
这个新增用例仍然设置 v_head_dim=2 且 head_dim=2,因此不会覆盖本 PR 最关键的 qkdim != vdim 分支:V shard size、param offset、shared KV slice 等仍按旧路径通过。建议至少加入 v_head_dim != head_dim(例如 head_dim=2, v_head_dim=3)的 fused/split load 断言,验证 V 段和后续 offset/gate 不重叠。
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.