Commit 21a4010
Add Quantizers for Qwen3VLMoeTextDecoderLayer (#666)
## What does this PR do?
**Type of change:** ? new feature
**Overview:** ? huggingface transformers library implements Qwen3VL Moe
layer as a monolithic module, instead of assembling it using Linear
layers, which cannot be recognized by modelopt's quantizer now. This PR
introduces a conversion from hf's qwen3vl_moe MoE layers to qewn3_moe
MoE layers which consist of a set of Linear layers.
## Testing
Tested with
```python
python hf_ptq.py --pyt_ckpt_path=Qwen/Qwen3-VL-30B-A3B-Instruct --qformat=nvfp4 --dataset wikipedia
```
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added quantization support for Qwen3VL models with sparse
mixture-of-experts (MoE) architecture, enabling efficient model
compression for this model type.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Qidong Su <qidongs@nvidia.com>
Signed-off-by: Qidong Su <soodoshll@gmail.com>
Co-authored-by: Shengliang Xu <106840466+shengliangxu@users.noreply.github.com>
Co-authored-by: Zhiyu <bestczy317@gmail.com>1 parent b0e7d9f commit 21a4010
1 file changed
Lines changed: 98 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
571 | 571 | | |
572 | 572 | | |
573 | 573 | | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
574 | 654 | | |
575 | 655 | | |
576 | 656 | | |
| |||
660 | 740 | | |
661 | 741 | | |
662 | 742 | | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
663 | 761 | | |
664 | 762 | | |
665 | 763 | | |
| |||
0 commit comments