Fixes & Simplifications for MCore KVCache QAT/QAD; Unittests; Distributed Sync of KVCache Quantizer params #727

realAsma · 2025-12-24T18:33:54Z

What does this PR do?

Type of change: Fix MCore KV Cache Quantization: Amax Device Placement Bug; Code clean up; Distributed Sync of KVCache Quantizer params; unittest expansion to hybrid models

Overview: Fixes bugs preventing MCore KV Cache quantization from working during checkpoint restore.

Bug Chain

Bug 1: is_enabled = self.weight_quantizer.is_enabled if hasattr(self, "weight_quantizer") else False

No weight_quantizer for KV-cache-only quant → is_enabled=False → metadata not saved → modelopt_post_restore() never called. (Thanks to @jenchen13 )

Bug 2: After fixing Bug 1, _amax restored on CPU (via _reset_pytorch_state_from_metadata). Fallback _calibrate_quantizers() never called because _amax exists.

Bug 3: Even if called, _calibrate_quantizers() fails — core_attention has no parameters → can't determine device/dtype.

The Fix

Remove is_enabled check entirely — disabled modules may still need metadata restore. Explicitly skip output_layer from extra state callbacks (never quantized)
Set dtype/device on core_attention from parent Attention module, modelopt_post_restore() calls self.to(device, dtype)
Remove dead _calibrate_quantizers() code (will bring back similar logic for KV cache affine quantization)

Previous Unit Test Was Wrong

model_test was mtq.quantize()'d, not mto.restore()'d. Never tested actual restore path.

Additional Fixes

Amax sync across DP/TP for KV cache quantizers
flash_decode auto-disabled

Code Cleanup

Removed ~100 lines of dead code.

Testing

MCore KV Cache QAD with Nano V3 + Context Parallel works
Unit tests: hybrid models, KV+GEMM configs, correct restore workflow, backward pass validation

Before your PR is "Ready for review"

Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: No
Did you update Changelog?: Yes

copy-pr-bot · 2025-12-24T18:33:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

modelopt/torch/quantization/plugins/megatron.py

modelopt/torch/quantization/nn/modules/tensor_quantizer.py

updated/cleaned up tests KV Cache clean ups; added MCore hybrid tests Added amax sync for KVCache Quantization minor Signed-off-by: realAsma <[email protected]>

codecov · 2026-01-06T17:07:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.69%. Comparing base (3350b0a) to head (abc6e3a).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #727      +/-   ##
==========================================
- Coverage   74.69%   74.69%   -0.01%     
==========================================
  Files         192      192              
  Lines       18946    18953       +7     
==========================================
+ Hits        14152    14156       +4     
- Misses       4794     4797       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: realAsma <[email protected]>

modelopt/torch/quantization/plugins/megatron.py

modelopt/torch/quantization/model_calib.py

Signed-off-by: realAsma <[email protected]>

jenchen13

LGTM, thanks so much!

tests/gpu/torch/quantization/plugins/test_megatron.py

kinjalpatel27

LGTM! Thank you

realAsma requested a review from a team as a code owner December 24, 2025 18:33

realAsma requested a review from jingyu-ml December 24, 2025 18:33

realAsma requested review from ChenhanYu, jenchen13 and kaix-nv December 24, 2025 18:34

realAsma changed the title ~~Fixes & Simplifications for MCore KVCache QAT/QAD~~ [Draft] Fixes & Simplifications for MCore KVCache QAT/QAD Dec 24, 2025

jenchen13 reviewed Dec 30, 2025

View reviewed changes

modelopt/torch/quantization/plugins/megatron.py Outdated Show resolved Hide resolved

jenchen13 reviewed Jan 6, 2026

View reviewed changes

modelopt/torch/quantization/nn/modules/tensor_quantizer.py Outdated Show resolved Hide resolved

Fixed for MCore KVCache QAD

89f507a

updated/cleaned up tests KV Cache clean ups; added MCore hybrid tests Added amax sync for KVCache Quantization minor Signed-off-by: realAsma <[email protected]>

realAsma force-pushed the asma/MCore_KVCache_fix branch from 249227d to 89f507a Compare January 6, 2026 16:56

realAsma changed the title ~~[Draft] Fixes & Simplifications for MCore KVCache QAT/QAD~~ Fixes & Simplifications for MCore KVCache QAT/QAD; Unittests; Distributed Sync of KVCache Quantizer params Jan 6, 2026

Fix safer memory access for contiguous inputs; update changelog

194edd4

Signed-off-by: realAsma <[email protected]>

realAsma requested review from jenchen13, kinjalpatel27 and mxinO January 6, 2026 18:14

Remove is_enabled check; skip output_layer explicitly

a8e92ba

Signed-off-by: realAsma <[email protected]>

ChenhanYu reviewed Jan 6, 2026

View reviewed changes

modelopt/torch/quantization/plugins/megatron.py Show resolved Hide resolved

jenchen13 approved these changes Jan 6, 2026

View reviewed changes

modelopt/torch/quantization/plugins/megatron.py Show resolved Hide resolved

kinjalpatel27 reviewed Jan 6, 2026

View reviewed changes

modelopt/torch/quantization/model_calib.py Outdated Show resolved Hide resolved

minor

724451e

Signed-off-by: realAsma <[email protected]>

realAsma force-pushed the asma/MCore_KVCache_fix branch from 7c6ea5a to 724451e Compare January 6, 2026 21:12

realAsma added 2 commits January 6, 2026 14:14

Revert sync_amax iterable, clarify flash_decode inference comment

b51d294

Signed-off-by: realAsma <[email protected]>

minor

abc6e3a

Signed-off-by: realAsma <[email protected]>

realAsma requested review from ChenhanYu and kinjalpatel27 January 6, 2026 22:32

jenchen13 approved these changes Jan 6, 2026

View reviewed changes

tests/gpu/torch/quantization/plugins/test_megatron.py Show resolved Hide resolved

kinjalpatel27 approved these changes Jan 6, 2026

View reviewed changes

realAsma merged commit 81c509c into main Jan 7, 2026
35 checks passed

realAsma deleted the asma/MCore_KVCache_fix branch January 7, 2026 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes & Simplifications for MCore KVCache QAT/QAD; Unittests; Distributed Sync of KVCache Quantizer params #727

Fixes & Simplifications for MCore KVCache QAT/QAD; Unittests; Distributed Sync of KVCache Quantizer params #727

Uh oh!

realAsma commented Dec 24, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Dec 24, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jenchen13 left a comment

Uh oh!

Uh oh!

kinjalpatel27 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fixes & Simplifications for MCore KVCache QAT/QAD; Unittests; Distributed Sync of KVCache Quantizer params #727

Fixes & Simplifications for MCore KVCache QAT/QAD; Unittests; Distributed Sync of KVCache Quantizer params #727

Uh oh!

Conversation

realAsma commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Bug Chain

The Fix

Previous Unit Test Was Wrong

Additional Fixes

Code Cleanup

Testing

Before your PR is "Ready for review"

Uh oh!

copy-pr-bot bot commented Dec 24, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jenchen13 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kinjalpatel27 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

realAsma commented Dec 24, 2025 •

edited

Loading

codecov bot commented Jan 6, 2026 •

edited

Loading