Add demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/pruning/ with README and notebooks#1320
Add demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/pruning/ with README and notebooks#1320achidiac-nv wants to merge 15 commits intomainfrom
Conversation
…xamples/ with README and notebooks Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new pruning/distillation tutorial set: a comprehensive README, an advanced experiments doc, and multiple Jupyter notebooks that provide end-to-end, runnable pipelines and examples for Minitron (homogeneous pruning) and Puzzletron (heterogeneous NAS + MIP pruning) on Qwen3-8B, including dataset preprocessing, pruning, distillation, evaluation, and benchmarking steps. ChangesPruning & Distillation Tutorial
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
examples/pruning_demo/README.md (3)
683-685: Add trailing newline at end of file.Per markdown conventions (MD047), files should end with a single newline character.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning_demo/README.md` around lines 683 - 685, The README.md file is missing a trailing newline; add a single newline character at the end of the file (after the final line containing the note about Minitron models and baseline) so the file ends with exactly one newline to satisfy MD047.
320-357: Add language identifier to fenced code block.The architecture details code block lacks a language specification, which affects syntax highlighting and accessibility.
📝 Suggested fix
-``` +```text block_0: attention kv_heads_8 ffn intermediate_12288🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning_demo/README.md` around lines 320 - 357, The fenced code block that starts with the line "block_0: attention kv_heads_8 ffn intermediate_12288" should include a language identifier to enable proper highlighting; update the opening triple-backtick for that block (the block showing block_0...block_35) to use a language token such as ```text (or ```plain) so the README's architecture details code block is marked correctly.
5-21: Fix markdown formatting issues in Table of Contents.Lines 5 and 21 have spaces inside link text which violates markdown best practices:
📝 Suggested fix
-1.[ Introduction](`#1-introduction`) +1. [Introduction](`#1-introduction`) -10.[ References](`#10-references`) +10. [References](`#10-references`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning_demo/README.md` around lines 5 - 21, Fix the markdown link text spacing in the Table of Contents by removing the extra spaces inside the square brackets for the affected entries: change "1.[ Introduction](`#1-introduction`)" to "1. [Introduction](`#1-introduction`)" and "10.[ References](`#10-references`)" to "10. [References](`#10-references`) so the link text has no leading/trailing spaces and spacing after the list number is consistent; verify similar entries follow the same "N. [Text](`#anchor`)" pattern.examples/pruning_demo/00_prerequisites.ipynb (1)
37-41: Hardcoded Python version in path may break on different container versions.The path
/opt/venv/lib/python3.12/site-packages/modeloptassumes Python 3.12. If the container or environment uses a different Python version, this will silently fail to replace the modelopt installation.🔧 Suggested improvement using dynamic Python version
-!rm -rf /opt/venv/lib/python3.12/site-packages/modelopt -!cp -r /workspace/Model-Optimizer/modelopt /opt/venv/lib/python3.12/site-packages/modelopt +import sys +site_packages = f"/opt/venv/lib/python{sys.version_info.major}.{sys.version_info.minor}/site-packages" +!rm -rf {site_packages}/modelopt +!cp -r /workspace/Model-Optimizer/modelopt {site_packages}/modelopt !mkdir -p /workspace/datasets🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning_demo/00_prerequisites.ipynb` around lines 37 - 41, The notebook currently hardcodes /opt/venv/lib/python3.12/site-packages/modelopt which will fail for other Python versions; replace the three shell commands with dynamic site-packages detection (e.g. set PY_SITE=$(python -c 'import sysconfig; print(sysconfig.get_paths()[\"purelib\"])') ) and then run rm -rf "$PY_SITE/modelopt" and cp -r /workspace/Model-Optimizer/modelopt "$PY_SITE"/ and keep the mkdir -p /workspace/datasets line—update the cell that contains the rm, cp and mkdir commands to use the PY_SITE variable instead of the 3.12 hardcoded path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/pruning_demo/scenario1_minitron.ipynb`:
- Around line 193-195: The cell calls subprocess.run("pkill -f tensorboard") but
does not import subprocess in that cell; add an import for subprocess (e.g.,
import subprocess) before the subprocess.run call or use from subprocess import
run and call run(...) so subprocess.run is defined (refer to the subprocess.run
invocation to locate the call and add the import in the same notebook cell).
In `@examples/pruning_demo/scenario1_puzzletron.ipynb`:
- Around line 231-233: The cell calls subprocess.run(["pkill", "-f",
"tensorboard"]) but does not import subprocess locally, which can raise
NameError if cells are run out of order; add an explicit import subprocess at
the top of the same notebook cell (or the cell immediately above) where
subprocess.run is used so the call in that cell always has the subprocess symbol
available.
- Around line 135-148: Remove the monkey-patch that prepends __version__ =
"0.4.8" into the installed lm_eval/__init__.py (the sed command that writes that
line); instead rely on the existing version-check/warning logic in
examples/llm_eval/lm_eval_hf.py (lines handling version mismatch) or document a
specific lm_eval prerequisite, and if the dtype issue remains, fix the exported
config file in the workspace (the sed that replaces "torch.bfloat16" ->
"bfloat16" in the solution config) or correct the Puzzletron export upstream
rather than editing site-packages.
In `@examples/pruning_demo/scenario2_minitron.ipynb`:
- Around line 167-169: The cell uses subprocess.run(["pkill", "-f",
"tensorboard"]) but doesn't import subprocess locally; add an explicit import
subprocess at the top of this cell (or merge the TensorBoard start/stop logic
into one cell) so subprocess.run is always defined even if cells are executed
out of order; ensure the import appears before the subprocess.run call to avoid
NameError.
In `@examples/pruning_demo/scenario2_puzzletron.ipynb`:
- Around line 364-366: The cell calls subprocess.run([ "pkill", "-f",
"tensorboard" ]) but never imports the subprocess module; add an import
subprocess statement (e.g., at the top of this cell or the notebook) so
subprocess.run is defined, and apply the same fix to all scenario notebooks that
use subprocess.run to keep them consistent.
---
Nitpick comments:
In `@examples/pruning_demo/00_prerequisites.ipynb`:
- Around line 37-41: The notebook currently hardcodes
/opt/venv/lib/python3.12/site-packages/modelopt which will fail for other Python
versions; replace the three shell commands with dynamic site-packages detection
(e.g. set PY_SITE=$(python -c 'import sysconfig;
print(sysconfig.get_paths()[\"purelib\"])') ) and then run rm -rf
"$PY_SITE/modelopt" and cp -r /workspace/Model-Optimizer/modelopt "$PY_SITE"/
and keep the mkdir -p /workspace/datasets line—update the cell that contains the
rm, cp and mkdir commands to use the PY_SITE variable instead of the 3.12
hardcoded path.
In `@examples/pruning_demo/README.md`:
- Around line 683-685: The README.md file is missing a trailing newline; add a
single newline character at the end of the file (after the final line containing
the note about Minitron models and baseline) so the file ends with exactly one
newline to satisfy MD047.
- Around line 320-357: The fenced code block that starts with the line "block_0:
attention kv_heads_8 ffn intermediate_12288" should include a language
identifier to enable proper highlighting; update the opening triple-backtick for
that block (the block showing block_0...block_35) to use a language token such
as ```text (or ```plain) so the README's architecture details code block is
marked correctly.
- Around line 5-21: Fix the markdown link text spacing in the Table of Contents
by removing the extra spaces inside the square brackets for the affected
entries: change "1.[ Introduction](`#1-introduction`)" to "1.
[Introduction](`#1-introduction`)" and "10.[ References](`#10-references`)" to "10.
[References](`#10-references`) so the link text has no leading/trailing spaces and
spacing after the list number is consistent; verify similar entries follow the
same "N. [Text](`#anchor`)" pattern.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 0c466b7c-b784-496d-802c-dfe00cad04ea
⛔ Files ignored due to path filters (6)
examples/pruning_demo/all_curves_throughput_vs_latency.pngis excluded by!**/*.pngexamples/pruning_demo/distillation_curves.pngis excluded by!**/*.pngexamples/pruning_demo/distillation_loss_7B.pngis excluded by!**/*.pngexamples/pruning_demo/memory_sweep.pngis excluded by!**/*.pngexamples/pruning_demo/memory_sweep_combined.pngis excluded by!**/*.pngexamples/pruning_demo/summary_chart.pngis excluded by!**/*.png
📒 Files selected for processing (7)
examples/pruning_demo/00_prerequisites.ipynbexamples/pruning_demo/README.mdexamples/pruning_demo/advanced_compression_experiments.mdexamples/pruning_demo/scenario1_minitron.ipynbexamples/pruning_demo/scenario1_puzzletron.ipynbexamples/pruning_demo/scenario2_minitron.ipynbexamples/pruning_demo/scenario2_puzzletron.ipynb
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1320 +/- ##
==========================================
+ Coverage 76.73% 77.37% +0.64%
==========================================
Files 476 476
Lines 51306 51306
==========================================
+ Hits 39369 39698 +329
+ Misses 11937 11608 -329
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…lint issues Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
examples/pruning/demo/scenario1_puzzletron.ipynb (1)
136-140:⚠️ Potential issue | 🟠 MajorRemove site-packages monkey patch for
lm_eval(Line 139).Editing
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.pyis fragile and environment-specific;examples/llm_eval/lm_eval_hf.pyalready warns when the version is not0.4.8instead of requiring file mutation.🔧 Suggested fix
!sed -i 's/"torch\\.bfloat16"/"bfloat16"/g' \ /workspace/puzzle_dir/mip/puzzle_solutions/target_memory_130000MiB-num_params_7G/solutions--checkpoints/solution_0/config.json -!sed -i '1s/^/__version__ = "0.4.8"\\n/' /usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py - !cd /workspace/Model-Optimizer && \ python examples/llm_eval/lm_eval_hf.py \#!/bin/bash set -euo pipefail python - <<'PY' import json p = "examples/pruning/demo/scenario1_puzzletron.ipynb" nb = json.load(open(p)) for i, c in enumerate(nb["cells"]): if c.get("cell_type") == "code": s = "".join(c.get("source", [])) if "lm_eval/__init__.py" in s and "__version__" in s: print(f"{p} -> cell {i} contains lm_eval monkey patch:") print(s) PY echo echo "Version-check behavior in lm_eval_hf.py:" rg -n 'if not lm_eval\.__version__\.startswith\("0\.4\.8"\)|warnings\.warn' examples/llm_eval/lm_eval_hf.py -C 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/demo/scenario1_puzzletron.ipynb` around lines 136 - 140, Remove the fragile site-packages monkey-patch that inserts "__version__ = \"0.4.8\"" into lm_eval/__init__.py; locate the notebook cell in scenario1_puzzletron.ipynb that runs the sed command string "!sed -i '1s/^/__version__ = \"0.4.8\"\\n/' /usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py" and delete that command (and any related sed edits for lm_eval), relying on the existing version-check/warning in examples/llm_eval/lm_eval_hf.py instead.
🧹 Nitpick comments (1)
examples/pruning/demo/README.md (1)
170-173: Avoid token-in-CLI examples for authentication (Line 171).Using
--token <your_token>in docs encourages secrets ending up in shell history. Prefer interactive login or env-var based usage.🔧 Suggested doc update
-hf auth login --token <your_token> +hf auth login +# or: +# export HF_TOKEN=... +# hf auth login --token "$HF_TOKEN"As per coding guidelines, "Never hardcode secrets, credentials, tokens, passwords, or API keys in source code. Use environment variables or configuration files listed in
.gitignoreto store sensitive information."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/demo/README.md` around lines 170 - 173, The README currently shows a CLI example using the explicit flag "hf auth login --token <your_token>", which risks exposing secrets; update the example to use an interactive login or environment-variable approach instead (e.g., instruct users to run "hf auth login" interactively or to set HF_TOKEN in their environment and call "hf download Qwen/Qwen3-8B --local-dir /workspace/models/Qwen3-8B" without embedding the token). Replace the inline token usage in the example and add a short note advising storing tokens in environment variables or a .env/config file excluded from VCS per the project's secret-handling guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/pruning/demo/README.md`:
- Line 5: The README has remaining markdownlint issues: fix the malformed
list/link "1.[ Introduction](`#1-introduction`)" by normalizing it to a proper
list or heading syntax, add or remove blank lines around headings and lists to
satisfy MD022 (ensure headings are surrounded by blank lines), collapse or
remove extra blank lines to address MD039, and ensure the file ends with a
single newline to resolve MD047; run markdownlint or your project's pre-commit
linter after updating README.md to confirm all warnings are cleared.
In `@examples/pruning/demo/scenario2_puzzletron.ipynb`:
- Around line 269-273: Remove the sed command that mutates
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py (the "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/' ..." cell) and any other notebook cell that
patches lm_eval; instead pin the package to lm-eval==0.4.8 in your environment
(requirements, pip install, or container image) so the version check in
examples/llm_eval/lm_eval_hf.py (the warning at line 47) can operate as
intended. Ensure no in-place edits to lm_eval/__init__.py remain in the
notebook.
---
Duplicate comments:
In `@examples/pruning/demo/scenario1_puzzletron.ipynb`:
- Around line 136-140: Remove the fragile site-packages monkey-patch that
inserts "__version__ = \"0.4.8\"" into lm_eval/__init__.py; locate the notebook
cell in scenario1_puzzletron.ipynb that runs the sed command string "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/'
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py" and delete that
command (and any related sed edits for lm_eval), relying on the existing
version-check/warning in examples/llm_eval/lm_eval_hf.py instead.
---
Nitpick comments:
In `@examples/pruning/demo/README.md`:
- Around line 170-173: The README currently shows a CLI example using the
explicit flag "hf auth login --token <your_token>", which risks exposing
secrets; update the example to use an interactive login or environment-variable
approach instead (e.g., instruct users to run "hf auth login" interactively or
to set HF_TOKEN in their environment and call "hf download Qwen/Qwen3-8B
--local-dir /workspace/models/Qwen3-8B" without embedding the token). Replace
the inline token usage in the example and add a short note advising storing
tokens in environment variables or a .env/config file excluded from VCS per the
project's secret-handling guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 67767531-5494-4153-9d9d-d17c53732c87
⛔ Files ignored due to path filters (6)
examples/pruning/demo/all_curves_throughput_vs_latency.pngis excluded by!**/*.pngexamples/pruning/demo/distillation_curves.pngis excluded by!**/*.pngexamples/pruning/demo/distillation_loss_7B.pngis excluded by!**/*.pngexamples/pruning/demo/memory_sweep.pngis excluded by!**/*.pngexamples/pruning/demo/memory_sweep_combined.pngis excluded by!**/*.pngexamples/pruning/demo/summary_chart.pngis excluded by!**/*.png
📒 Files selected for processing (7)
examples/pruning/demo/00_prerequisites.ipynbexamples/pruning/demo/README.mdexamples/pruning/demo/advanced_compression_experiments.mdexamples/pruning/demo/scenario1_minitron.ipynbexamples/pruning/demo/scenario1_puzzletron.ipynbexamples/pruning/demo/scenario2_minitron.ipynbexamples/pruning/demo/scenario2_puzzletron.ipynb
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
examples/pruning/demo/README.md (1)
5-21:⚠️ Potential issue | 🟡 MinorFix TOC list formatting for consistent Markdown rendering.
Line 5 and Line 21 are missing a space after the numeric list marker (
1./10.), which breaks standard ordered-list formatting.✏️ Suggested fix
-1.[Introduction](`#1-introduction`) +1. [Introduction](`#1-introduction`) ... -10.[References](`#10-references`) +10. [References](`#10-references`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/demo/README.md` around lines 5 - 21, The table-of-contents ordered list items "1.[Introduction]" and "10.[References]" are missing a space after the numeric marker which breaks Markdown rendering; edit the TOC entries (look for the strings "1.[Introduction]" and "10.[References]") to insert a space after the period (e.g., "1. [Introduction]" and "10. [References]") and scan the other numbered list entries in that block to ensure all ordered markers follow the same "N. Item" spacing for consistent Markdown formatting.
🧹 Nitpick comments (1)
examples/pruning/demo/README.md (1)
171-174: Avoid inline token patterns in auth instructions.Using
hf auth login --token <your_token>encourages token-in-command usage (shell history/log risk). Prefer interactive login or env-var-based token usage in docs.🔐 Suggested tweak
-hf auth login --token <your_token> +hf auth login +# or (non-interactive): +# hf auth login --token "$HF_TOKEN"As per coding guidelines: “Never hardcode secrets, credentials, tokens, passwords, or API keys in source code. Use environment variables or configuration files listed in
.gitignoreto store sensitive information.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/demo/README.md` around lines 171 - 174, Replace the inline-token pattern shown in the README snippet ("hf auth login --token <your_token>") with a secure alternative: remove examples that put tokens directly in commands and instead instruct users to either use interactive login (e.g., run the CLI without a token to be prompted) or set their token via an environment variable (e.g., export HF_TOKEN=...) or a credentials file, then run the download command ("hf download Qwen/Qwen3-8B --local-dir /workspace/models/Qwen3-8B") normally; update the README guidance accordingly so tokens are never shown inline or hardcoded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/pruning/demo/scenario2_puzzletron.ipynb`:
- Around line 436-441: The cell currently pipes the sweep run through `tee` into
`grep "Puzzletron Progress"`, coupling cell exit status to the grep match;
instead run the sweep command and capture full output to
/workspace/puzzletron_sweep.log via `tee` without piping into `grep`, then
perform any `grep "Puzzletron Progress"` as a separate non-blocking step (or as
a follow-up cell) so the exit status reflects the actual run of
examples/puzzletron/main.py (the --mip-only invocation using the
qwen3_8b_pruneffn_memory config) and not whether the log contained the string.
---
Duplicate comments:
In `@examples/pruning/demo/README.md`:
- Around line 5-21: The table-of-contents ordered list items "1.[Introduction]"
and "10.[References]" are missing a space after the numeric marker which breaks
Markdown rendering; edit the TOC entries (look for the strings
"1.[Introduction]" and "10.[References]") to insert a space after the period
(e.g., "1. [Introduction]" and "10. [References]") and scan the other numbered
list entries in that block to ensure all ordered markers follow the same "N.
Item" spacing for consistent Markdown formatting.
---
Nitpick comments:
In `@examples/pruning/demo/README.md`:
- Around line 171-174: Replace the inline-token pattern shown in the README
snippet ("hf auth login --token <your_token>") with a secure alternative: remove
examples that put tokens directly in commands and instead instruct users to
either use interactive login (e.g., run the CLI without a token to be prompted)
or set their token via an environment variable (e.g., export HF_TOKEN=...) or a
credentials file, then run the download command ("hf download Qwen/Qwen3-8B
--local-dir /workspace/models/Qwen3-8B") normally; update the README guidance
accordingly so tokens are never shown inline or hardcoded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b892f30b-4096-418c-ae84-79b6420e9dc3
📒 Files selected for processing (7)
examples/pruning/demo/00_prerequisites.ipynbexamples/pruning/demo/README.mdexamples/pruning/demo/advanced_compression_experiments.mdexamples/pruning/demo/scenario1_minitron.ipynbexamples/pruning/demo/scenario1_puzzletron.ipynbexamples/pruning/demo/scenario2_minitron.ipynbexamples/pruning/demo/scenario2_puzzletron.ipynb
✅ Files skipped from review due to trivial changes (1)
- examples/pruning/demo/00_prerequisites.ipynb
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb (1)
45-58: ⚡ Quick winAvoid mutating checked-in YAML configs in-place during notebook runs.
Lines 45–58 edit files under
/opt/Model-Optimizer/..., which can leave the mounted repo dirty and make later runs depend on prior notebook state. Copy config(s) to/workspaceand edit/run from the copied paths instead.♻️ Suggested adjustment
-!sed -i 's|input_hf_model_path: .*|input_hf_model_path: /workspace/models/Qwen3-8B|' \ - /opt/Model-Optimizer/examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml +!cp /opt/Model-Optimizer/examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml \ + /workspace/qwen3_8b_pruneffn_memory.scenario1.yaml +!cp /opt/Model-Optimizer/examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b.yaml \ + /workspace/qwen3_8b.scenario1.yaml +!sed -i 's|input_hf_model_path: .*|input_hf_model_path: /workspace/models/Qwen3-8B|' \ + /workspace/qwen3_8b_pruneffn_memory.scenario1.yamlThen point
--configto/workspace/qwen3_8b_pruneffn_memory.scenario1.yaml.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb` around lines 45 - 58, The notebook currently mutates checked-in YAMLs using sed -i on /opt/Model-Optimizer/examples/puzzletron/configs/... which can dirty the mounted repo; instead copy the needed configs to /workspace (e.g., copy qwen3_8b_pruneffn_memory.yaml and qwen3_8b.yaml to /workspace/qwen3_8b_pruneffn_memory.scenario1.yaml), run your sed replacements against those copied files (avoid sed -i on the repo paths), and point any invocation that uses --config to the new /workspace/qwen3_8b_pruneffn_memory.scenario1.yaml so all edits are local to the workspace and do not modify the mounted repository.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/pruning/minitron_vs_puzzletron/advanced_compression_experiments.md`:
- Around line 152-163: The MMLU scores in the markdown table currently use
percent-style numbers (e.g., 78.6) while earlier sections use decimals (e.g.,
0.7493); update the table header "MMLU" to "MMLU (%)" or convert all table
entries (Model rows such as "Nemotron-Nano-12B-v2", "Minitron 10B", "Puzzletron
10B", "Minitron 34k", "Puzzletron 34k") to decimal format to match Section 4.1,
and ensure the bolded values follow the same unit convention for consistency
across the document.
In `@examples/pruning/minitron_vs_puzzletron/README.md`:
- Around line 86-87: The README currently contradicts itself about Minitron's
memory-budget support: reconcile the two statements by either removing or
clarifying the claim in the opening description (the paragraph mentioning
"Minitron applies homogeneous pruning" and "direct memory-budget targeting is
now supported") so it matches the later note at lines ~412–413; if memory-budget
targeting is supported, update the later text in Sections 4/5 and the decision
framework to explain its scope and limitations (e.g., whether it maps memory
target to parameter count or uses an internal memory-aware heuristic); if it is
not supported, remove the earlier memory-budget phrase and ensure all references
to "memory-budget targeting" are eliminated or marked as TODO. Reference:
Minitron, memory-budget targeting, Sections 4/5, and the opening description
paragraph to ensure consistency.
- Line 1: The README title string incorrectly says "Reduce Your LLM Size and
Efficiency" which implies lowering efficiency; update the title text to convey
size reduction while improving efficiency — e.g., change the title to "Reduce
Your LLM Size and Improve Efficiency with NVIDIA Model-Optimizer: A Pruning &
Distillation Guide" so the intent is clear; locate and edit the title line in
README.md (the first line containing the current heading) and replace the phrase
"and Efficiency" with "and Improve Efficiency" or similar positive wording.
In `@examples/pruning/minitron_vs_puzzletron/scenario2_puzzletron.ipynb`:
- Around line 418-423: The cell currently pipes torchrun output through tee into
grep which makes the notebook cell fail if the string "Puzzletron Progress" is
not printed; change the pipeline so torchrun output is only written with tee to
/workspace/puzzletron_sweep.log (remove the trailing | grep "Puzzletron
Progress") and then add a separate, optional follow-up inspection step that runs
grep against /workspace/puzzletron_sweep.log to check for "Puzzletron Progress";
locate the cell that invokes torchrun --nproc_per_node 1
examples/puzzletron/main.py ... --mip-only and modify that line to stop piping
into grep and instead write only with tee, then add a separate grep command to
inspect the log.
---
Nitpick comments:
In `@examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb`:
- Around line 45-58: The notebook currently mutates checked-in YAMLs using sed
-i on /opt/Model-Optimizer/examples/puzzletron/configs/... which can dirty the
mounted repo; instead copy the needed configs to /workspace (e.g., copy
qwen3_8b_pruneffn_memory.yaml and qwen3_8b.yaml to
/workspace/qwen3_8b_pruneffn_memory.scenario1.yaml), run your sed replacements
against those copied files (avoid sed -i on the repo paths), and point any
invocation that uses --config to the new
/workspace/qwen3_8b_pruneffn_memory.scenario1.yaml so all edits are local to the
workspace and do not modify the mounted repository.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f58449e7-aaea-40a4-8f48-76ee32828a1f
⛔ Files ignored due to path filters (6)
examples/pruning/minitron_vs_puzzletron/figures/all_curves_throughput_vs_latency.pngis excluded by!**/*.pngexamples/pruning/minitron_vs_puzzletron/figures/distillation_curves.pngis excluded by!**/*.pngexamples/pruning/minitron_vs_puzzletron/figures/distillation_loss_7B.pngis excluded by!**/*.pngexamples/pruning/minitron_vs_puzzletron/figures/memory_sweep.pngis excluded by!**/*.pngexamples/pruning/minitron_vs_puzzletron/figures/memory_sweep_combined.pngis excluded by!**/*.pngexamples/pruning/minitron_vs_puzzletron/figures/summary_chart.pngis excluded by!**/*.png
📒 Files selected for processing (7)
examples/pruning/minitron_vs_puzzletron/00_prerequisites.ipynbexamples/pruning/minitron_vs_puzzletron/README.mdexamples/pruning/minitron_vs_puzzletron/advanced_compression_experiments.mdexamples/pruning/minitron_vs_puzzletron/scenario1_minitron.ipynbexamples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynbexamples/pruning/minitron_vs_puzzletron/scenario2_minitron.ipynbexamples/pruning/minitron_vs_puzzletron/scenario2_puzzletron.ipynb
✅ Files skipped from review due to trivial changes (2)
- examples/pruning/minitron_vs_puzzletron/00_prerequisites.ipynb
- examples/pruning/minitron_vs_puzzletron/scenario1_minitron.ipynb
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/pruning/minitron_vs_puzzletron/README.md`:
- Line 34: The sentence describing MMLU uses redundant phrasing ("a 4-choice
multiple choice problem"); update the wording around "MMLU (Massive Multitask
Language Understanding)" to something tighter and consistent such as "a 4-choice
multiple-choice question" (or "4-choice multiple-choice problem") so the phrase
is not duplicated—locate the MMLU sentence and replace the redundant fragment
accordingly.
- Around line 125-128: The README currently suggests using "chmod -R 777
${MODELOPT_DIR}", which makes the repo world-writable; instead instruct to fix
ownership or use least-privilege perms: update the instruction to set ownership
to the container user (e.g., mention running chown -R ${HOST_UID}:${HOST_GID}
${MODELOPT_DIR} or use chmod -R 770/755 as appropriate) and add a short note
explaining the root cause (container UID/GID mismatch) and when/how to adjust
HOST_UID/HOST_GID before mounting so the container can write without 777.
- Around line 180-182: Replace the unsafe inline token usage shown as "hf auth
login --token <your_token>" with a safer alternative: show the interactive login
command "hf auth login" (no token on the CLI) and/or demonstrate using an
environment variable (e.g., export HF_TOKEN and then call hf auth login --token
"$HF_TOKEN" from a script or CI) so credentials are not exposed in shell history
or process listings; update the README examples around the hf auth/login command
and keep the hf download example unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 12b821ba-b6af-4f18-8c48-7ba6b4787ba0
📒 Files selected for processing (1)
examples/pruning/minitron_vs_puzzletron/README.md
|
|
||
| To validate that Minitron is the right choice for this scenario, we also ran Puzzletron at the same ~7B parameter target. Puzzletron produces a 36-layer heterogeneous model with variable FFN widths per layer (some as low as 2560) and selective attention removal. | ||
|
|
||
| ▶ See notebook [`scenario1_puzzletron.ipynb`](scenario1_puzzletron.ipynb) to reproduce this run. |
There was a problem hiding this comment.
▶ This icon in markdown generally means collapsible section you click to expand more details and may be confusing. Can we use some other icon?
What does this PR do?
Type of change: new documentation/example (tutorial + notebooks)
Adds an end-to-end pruning & distillation guide under
examples/pruning_demo/, walking users through structural compression of Qwen3-8B with NVIDIA Model-Optimizer.The example compares two methods side-by-side on two concrete scenarios:
Both scenarios are followed by knowledge distillation and evaluated on MMLU (end-to-end in the notebooks) plus HellaSwag and GSM8K (reported in the guide).
Contents:
README.md— full guide (setup, two scenarios, head-to-head analysis, inference benchmarks with vLLM + AIPerf, decision rules, limitations, open questions).00_prerequisites.ipynb— data prep (WikiText-103 → Megatron binary) and teacher baseline evaluation.scenario1_minitron.ipynb/scenario1_puzzletron.ipynb— 7B-param target.scenario2_minitron.ipynb/scenario2_puzzletron.ipynb— 78k-MiB target, including a Puzzletron memory-sweep bonus section.advanced_compression_experiments.md— extended results (larger distillation budgets with Nemotron-Post-Training-Dataset-v2, BLD, chained Minitron→Puzzletron, Mamba-Transformer hybrid).summary_chart.png,distillation_curves.png,memory_sweep_combined.png,all_curves_throughput_vs_latency.png, ...).Usage
Follow setup instructions in README.md then run, in order:
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/A — no new runtime dependencies; the notebooks use lm-eval==0.4.8 and the existing ModelOpt/NeMo stack. The vLLM serving appendix references an open PR ([Model] Add AnyModel: generic support for NAS-optimized heterogeneous architectures vllm-project/vllm#36512) for Puzzletron AnyModel support, clearly flagged as pre-release.Additional Information
Summary by CodeRabbit