-
Notifications
You must be signed in to change notification settings - Fork 232
Fixes & Simplifications for MCore KVCache QAT/QAD; Unittests; Distributed Sync of KVCache Quantizer params #727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
updated/cleaned up tests KV Cache clean ups; added MCore hybrid tests Added amax sync for KVCache Quantization minor Signed-off-by: realAsma <[email protected]>
249227d to
89f507a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #727 +/- ##
==========================================
- Coverage 74.69% 74.69% -0.01%
==========================================
Files 192 192
Lines 18946 18953 +7
==========================================
+ Hits 14152 14156 +4
- Misses 4794 4797 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: realAsma <[email protected]>
Signed-off-by: realAsma <[email protected]>
Signed-off-by: realAsma <[email protected]>
7c6ea5a to
724451e
Compare
Signed-off-by: realAsma <[email protected]>
Signed-off-by: realAsma <[email protected]>
jenchen13
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks so much!
kinjalpatel27
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you
What does this PR do?
Type of change: Fix MCore KV Cache Quantization: Amax Device Placement Bug; Code clean up; Distributed Sync of KVCache Quantizer params; unittest expansion to hybrid models
Overview: Fixes bugs preventing MCore KV Cache quantization from working during checkpoint restore.
Bug Chain
Bug 1:
is_enabled = self.weight_quantizer.is_enabled if hasattr(self, "weight_quantizer") else FalseNo
weight_quantizerfor KV-cache-only quant →is_enabled=False→ metadata not saved →modelopt_post_restore()never called. (Thanks to @jenchen13 )Bug 2: After fixing Bug 1,
_amaxrestored on CPU (via_reset_pytorch_state_from_metadata). Fallback_calibrate_quantizers()never called because_amaxexists.Bug 3: Even if called,
_calibrate_quantizers()fails —core_attentionhas no parameters → can't determine device/dtype.The Fix
is_enabledcheck entirely — disabled modules may still need metadata restore. Explicitly skipoutput_layerfrom extra state callbacks (never quantized)dtype/deviceoncore_attentionfrom parent Attention module,modelopt_post_restore()callsself.to(device, dtype)_calibrate_quantizers()code (will bring back similar logic for KV cache affine quantization)Previous Unit Test Was Wrong
model_testwasmtq.quantize()'d, notmto.restore()'d. Never tested actual restore path.Additional Fixes
flash_decodeauto-disabledCode Cleanup
Removed ~100 lines of dead code.
Testing
Before your PR is "Ready for review"