Skip to content

Conversation

@realAsma
Copy link
Contributor

@realAsma realAsma commented Dec 24, 2025

What does this PR do?

Type of change: Fix MCore KV Cache Quantization: Amax Device Placement Bug; Code clean up; Distributed Sync of KVCache Quantizer params; unittest expansion to hybrid models

Overview: Fixes bugs preventing MCore KV Cache quantization from working during checkpoint restore.

Bug Chain

Bug 1: is_enabled = self.weight_quantizer.is_enabled if hasattr(self, "weight_quantizer") else False

No weight_quantizer for KV-cache-only quant → is_enabled=False → metadata not saved → modelopt_post_restore() never called. (Thanks to @jenchen13 )

Bug 2: After fixing Bug 1, _amax restored on CPU (via _reset_pytorch_state_from_metadata). Fallback _calibrate_quantizers() never called because _amax exists.

Bug 3: Even if called, _calibrate_quantizers() fails — core_attention has no parameters → can't determine device/dtype.

The Fix

  1. Remove is_enabled check entirely — disabled modules may still need metadata restore. Explicitly skip output_layer from extra state callbacks (never quantized)
  2. Set dtype/device on core_attention from parent Attention module, modelopt_post_restore() calls self.to(device, dtype)
  3. Remove dead _calibrate_quantizers() code (will bring back similar logic for KV cache affine quantization)

Previous Unit Test Was Wrong

model_test was mtq.quantize()'d, not mto.restore()'d. Never tested actual restore path.

Additional Fixes

  • Amax sync across DP/TP for KV cache quantizers
  • flash_decode auto-disabled

Code Cleanup

Removed ~100 lines of dead code.

Testing

  1. MCore KV Cache QAD with Nano V3 + Context Parallel works
  2. Unit tests: hybrid models, KV+GEMM configs, correct restore workflow, backward pass validation

Before your PR is "Ready for review"

  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: Yes

@realAsma realAsma requested a review from a team as a code owner December 24, 2025 18:33
@realAsma realAsma requested a review from jingyu-ml December 24, 2025 18:33
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@realAsma realAsma changed the title Fixes & Simplifications for MCore KVCache QAT/QAD [Draft] Fixes & Simplifications for MCore KVCache QAT/QAD Dec 24, 2025
updated/cleaned up tests

KV Cache clean ups; added MCore hybrid tests

Added amax sync for KVCache Quantization

minor

Signed-off-by: realAsma <[email protected]>
@realAsma realAsma force-pushed the asma/MCore_KVCache_fix branch from 249227d to 89f507a Compare January 6, 2026 16:56
@realAsma realAsma changed the title [Draft] Fixes & Simplifications for MCore KVCache QAT/QAD Fixes & Simplifications for MCore KVCache QAT/QAD; Unittests; Distributed Sync of KVCache Quantizer params Jan 6, 2026
@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.69%. Comparing base (3350b0a) to head (abc6e3a).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #727      +/-   ##
==========================================
- Coverage   74.69%   74.69%   -0.01%     
==========================================
  Files         192      192              
  Lines       18946    18953       +7     
==========================================
+ Hits        14152    14156       +4     
- Misses       4794     4797       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: realAsma <[email protected]>
@realAsma realAsma force-pushed the asma/MCore_KVCache_fix branch from 7c6ea5a to 724451e Compare January 6, 2026 21:12
Copy link
Contributor

@jenchen13 jenchen13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks so much!

Copy link
Contributor

@kinjalpatel27 kinjalpatel27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you

@realAsma realAsma merged commit 81c509c into main Jan 7, 2026
35 checks passed
@realAsma realAsma deleted the asma/MCore_KVCache_fix branch January 7, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants