Skip to content

Add demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/pruning/ with README and notebooks#1320

Open
achidiac-nv wants to merge 15 commits intomainfrom
achidiac/pruning_demo
Open

Add demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/pruning/ with README and notebooks#1320
achidiac-nv wants to merge 15 commits intomainfrom
achidiac/pruning_demo

Conversation

@achidiac-nv
Copy link
Copy Markdown

@achidiac-nv achidiac-nv commented Apr 22, 2026

What does this PR do?

Type of change: new documentation/example (tutorial + notebooks)

Adds an end-to-end pruning & distillation guide under examples/pruning_demo/, walking users through structural compression of Qwen3-8B with NVIDIA Model-Optimizer.

The example compares two methods side-by-side on two concrete scenarios:

  • Scenario 1 — Moderate compression (7B parameter target): homogeneous pruning with Minitron vs. heterogeneous NAS-based pruning with Puzzletron.
  • Scenario 2 — Aggressive compression (78,000 MiB memory budget): same comparison under a hard memory constraint.

Both scenarios are followed by knowledge distillation and evaluated on MMLU (end-to-end in the notebooks) plus HellaSwag and GSM8K (reported in the guide).

Contents:

  • README.md — full guide (setup, two scenarios, head-to-head analysis, inference benchmarks with vLLM + AIPerf, decision rules, limitations, open questions).
  • 00_prerequisites.ipynb — data prep (WikiText-103 → Megatron binary) and teacher baseline evaluation.
  • scenario1_minitron.ipynb / scenario1_puzzletron.ipynb — 7B-param target.
  • scenario2_minitron.ipynb / scenario2_puzzletron.ipynb — 78k-MiB target, including a Puzzletron memory-sweep bonus section.
  • advanced_compression_experiments.md — extended results (larger distillation budgets with Nemotron-Post-Training-Dataset-v2, BLD, chained Minitron→Puzzletron, Mamba-Transformer hybrid).
  • Companion plots (summary_chart.png, distillation_curves.png, memory_sweep_combined.png, all_curves_throughput_vs_latency.png, ...).

Usage

Follow setup instructions in README.md then run, in order:

  1. 00_prerequisites.ipynb — prepare data + baseline eval (~15 min).
  2. One (or more) of the scenario notebooks:
  • scenario1_minitron.ipynb (~1h45)
  • scenario1_puzzletron.ipynb (~6h first run)
  • scenario2_minitron.ipynb (~45 min)
  • scenario2_puzzletron.ipynb (~6h15 first run)

Testing

  • All four scenario notebooks were executed and tested end-to-end on 2x H200 GPUs
  • Inference benchmarks were captured on 1x H200 NVL with vLLM (AnyModel backend for Puzzletron checkpoints) and AIPerf
  • No library code is modified, so no unit tests are affected

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ (documentation/examples-only addition)
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A — no new runtime dependencies; the notebooks use lm-eval==0.4.8 and the existing ModelOpt/NeMo stack. The vLLM serving appendix references an open PR ([Model] Add AnyModel: generic support for NAS-optimized heterogeneous architectures vllm-project/vllm#36512) for Puzzletron AnyModel support, clearly flagged as pre-release.
  • Did you write any new necessary tests?: N/A — tutorial / documentation example.
  • Did you update Changelog?: N/A

Additional Information

  • Base model: https://huggingface.co/Qwen/Qwen3-8B
  • Calibration dataset: nvidia/Nemotron-Post-Training-Dataset-v2
  • Distillation dataset: WikiText-103
  • Complements the existing examples/puzzletron/ and examples/megatron_bridge/ READMEs with a scenario-driven narrative and a direct Minitron↔Puzzletron comparison.

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guides comparing Minitron (homogeneous pruning) and Puzzletron (heterogeneous NAS + MIP) for LLM compression.
    • Added step-by-step Jupyter notebooks demonstrating two end-to-end scenarios (prune → distill → evaluate) with expected MMLU baselines and memory budgeting.
    • Added advanced experiments doc with extended results, chaining strategies, benchmarking (including inference/vLLM notes), tips, limitations, and appendices.

…xamples/ with README and notebooks

Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
@achidiac-nv achidiac-nv requested a review from a team as a code owner April 22, 2026 15:30
@achidiac-nv achidiac-nv requested a review from realAsma April 22, 2026 15:30
@achidiac-nv achidiac-nv added the documentation Improvements or additions to documentation label Apr 22, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new pruning/distillation tutorial set: a comprehensive README, an advanced experiments doc, and multiple Jupyter notebooks that provide end-to-end, runnable pipelines and examples for Minitron (homogeneous pruning) and Puzzletron (heterogeneous NAS + MIP pruning) on Qwen3-8B, including dataset preprocessing, pruning, distillation, evaluation, and benchmarking steps.

Changes

Pruning & Distillation Tutorial

Layer / File(s) Summary
Prerequisites / Data Prep
examples/pruning/minitron_vs_puzzletron/00_prerequisites.ipynb
Notebook adds dataset preprocessing (tokenize WikiText-103 with megatron_preprocess_data) and an MMLU baseline eval of the uncompressed Qwen3-8B teacher (expected MMLU 0.7493).
Main Guide / Overview
examples/pruning/minitron_vs_puzzletron/README.md
New end-to-end guide covering pipeline, prerequisites, hardware/container setup, conceptual comparison of Minitron vs Puzzletron, scenario walkthroughs, consolidated results, distillation notes, vLLM inference benchmarking, limitations, and appendix with vLLM serving steps.
Scenario 1 — Minitron (Prune → Distill)
examples/pruning/minitron_vs_puzzletron/scenario1_minitron.ipynb
Notebook documents Minitron pruning to ~7B, verifies pruned artifacts, runs pre-distillation MMLU eval, launches TensorBoard, runs distill.py (training config/export), and evaluates distilled model on MMLU.
Scenario 1 — Puzzletron (NAS → Distill)
examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb
Notebook prepares Nemotron calibration dataset, patches Puzzletron YAML for Qwen3-8B→7B search, runs NAS search, evaluates pruned model, launches TensorBoard, runs distillation, and evaluates final distilled checkpoint.
Scenario 2 — Minitron (Aggressive Depth Prune)
examples/pruning/minitron_vs_puzzletron/scenario2_minitron.ipynb
Notebook prunes Qwen3-8B depth (36→22 layers), lists exported artifacts, runs pre-/post-distillation MMLU evaluations, runs distillation with TB monitoring, and documents expected results.
Scenario 2 — Puzzletron (Memory-Constrained NAS)
examples/pruning/minitron_vs_puzzletron/scenario2_puzzletron.ipynb
Notebook configures Puzzletron for a 78,000 MiB budget, computes/prints memory-footprint estimates, runs NAS search, evaluates pruned model, runs distillation with TB, evaluates distilled model, and includes a bonus memory-sweep section to produce compression trade-off curves.
Advanced Experiments & Analysis
examples/pruning/minitron_vs_puzzletron/advanced_compression_experiments.md
New doc with extended experiments: extended distillation scaling, chaining Minitron+Puzzletron, Blockwise Local Distillation (BLD) effects, and application to a Mamba-Transformer hybrid; includes result tables and key takeaways.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • danielkorzekwa
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: adding a comprehensive pruning demo with Puzzletron and Minitron guidance, README, and notebooks to the examples directory.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR contains only documentation and notebooks. No security anti-patterns detected: no unsafe torch.load, numpy.load, trust_remote_code, eval/exec, # nosec comments, or new problematic dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch achidiac/pruning_demo

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
examples/pruning_demo/README.md (3)

683-685: Add trailing newline at end of file.

Per markdown conventions (MD047), files should end with a single newline character.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning_demo/README.md` around lines 683 - 685, The README.md file
is missing a trailing newline; add a single newline character at the end of the
file (after the final line containing the note about Minitron models and
baseline) so the file ends with exactly one newline to satisfy MD047.

320-357: Add language identifier to fenced code block.

The architecture details code block lacks a language specification, which affects syntax highlighting and accessibility.

📝 Suggested fix
-```
+```text
 block_0:   attention  kv_heads_8    ffn  intermediate_12288
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning_demo/README.md` around lines 320 - 357, The fenced code
block that starts with the line "block_0:   attention  kv_heads_8    ffn 
intermediate_12288" should include a language identifier to enable proper
highlighting; update the opening triple-backtick for that block (the block
showing block_0...block_35) to use a language token such as ```text (or
```plain) so the README's architecture details code block is marked correctly.

5-21: Fix markdown formatting issues in Table of Contents.

Lines 5 and 21 have spaces inside link text which violates markdown best practices:

📝 Suggested fix
-1.[ Introduction](`#1-introduction`)
+1. [Introduction](`#1-introduction`)

-10.[ References](`#10-references`)
+10. [References](`#10-references`)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning_demo/README.md` around lines 5 - 21, Fix the markdown link
text spacing in the Table of Contents by removing the extra spaces inside the
square brackets for the affected entries: change "1.[
Introduction](`#1-introduction`)" to "1. [Introduction](`#1-introduction`)" and
"10.[ References](`#10-references`)" to "10. [References](`#10-references`) so the
link text has no leading/trailing spaces and spacing after the list number is
consistent; verify similar entries follow the same "N. [Text](`#anchor`)" pattern.
examples/pruning_demo/00_prerequisites.ipynb (1)

37-41: Hardcoded Python version in path may break on different container versions.

The path /opt/venv/lib/python3.12/site-packages/modelopt assumes Python 3.12. If the container or environment uses a different Python version, this will silently fail to replace the modelopt installation.

🔧 Suggested improvement using dynamic Python version
-!rm -rf /opt/venv/lib/python3.12/site-packages/modelopt
-!cp -r /workspace/Model-Optimizer/modelopt /opt/venv/lib/python3.12/site-packages/modelopt
+import sys
+site_packages = f"/opt/venv/lib/python{sys.version_info.major}.{sys.version_info.minor}/site-packages"
+!rm -rf {site_packages}/modelopt
+!cp -r /workspace/Model-Optimizer/modelopt {site_packages}/modelopt
 !mkdir -p /workspace/datasets
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning_demo/00_prerequisites.ipynb` around lines 37 - 41, The
notebook currently hardcodes /opt/venv/lib/python3.12/site-packages/modelopt
which will fail for other Python versions; replace the three shell commands with
dynamic site-packages detection (e.g. set PY_SITE=$(python -c 'import sysconfig;
print(sysconfig.get_paths()[\"purelib\"])') ) and then run rm -rf
"$PY_SITE/modelopt" and cp -r /workspace/Model-Optimizer/modelopt "$PY_SITE"/
and keep the mkdir -p /workspace/datasets line—update the cell that contains the
rm, cp and mkdir commands to use the PY_SITE variable instead of the 3.12
hardcoded path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/pruning_demo/scenario1_minitron.ipynb`:
- Around line 193-195: The cell calls subprocess.run("pkill -f tensorboard") but
does not import subprocess in that cell; add an import for subprocess (e.g.,
import subprocess) before the subprocess.run call or use from subprocess import
run and call run(...) so subprocess.run is defined (refer to the subprocess.run
invocation to locate the call and add the import in the same notebook cell).

In `@examples/pruning_demo/scenario1_puzzletron.ipynb`:
- Around line 231-233: The cell calls subprocess.run(["pkill", "-f",
"tensorboard"]) but does not import subprocess locally, which can raise
NameError if cells are run out of order; add an explicit import subprocess at
the top of the same notebook cell (or the cell immediately above) where
subprocess.run is used so the call in that cell always has the subprocess symbol
available.
- Around line 135-148: Remove the monkey-patch that prepends __version__ =
"0.4.8" into the installed lm_eval/__init__.py (the sed command that writes that
line); instead rely on the existing version-check/warning logic in
examples/llm_eval/lm_eval_hf.py (lines handling version mismatch) or document a
specific lm_eval prerequisite, and if the dtype issue remains, fix the exported
config file in the workspace (the sed that replaces "torch.bfloat16" ->
"bfloat16" in the solution config) or correct the Puzzletron export upstream
rather than editing site-packages.

In `@examples/pruning_demo/scenario2_minitron.ipynb`:
- Around line 167-169: The cell uses subprocess.run(["pkill", "-f",
"tensorboard"]) but doesn't import subprocess locally; add an explicit import
subprocess at the top of this cell (or merge the TensorBoard start/stop logic
into one cell) so subprocess.run is always defined even if cells are executed
out of order; ensure the import appears before the subprocess.run call to avoid
NameError.

In `@examples/pruning_demo/scenario2_puzzletron.ipynb`:
- Around line 364-366: The cell calls subprocess.run([ "pkill", "-f",
"tensorboard" ]) but never imports the subprocess module; add an import
subprocess statement (e.g., at the top of this cell or the notebook) so
subprocess.run is defined, and apply the same fix to all scenario notebooks that
use subprocess.run to keep them consistent.

---

Nitpick comments:
In `@examples/pruning_demo/00_prerequisites.ipynb`:
- Around line 37-41: The notebook currently hardcodes
/opt/venv/lib/python3.12/site-packages/modelopt which will fail for other Python
versions; replace the three shell commands with dynamic site-packages detection
(e.g. set PY_SITE=$(python -c 'import sysconfig;
print(sysconfig.get_paths()[\"purelib\"])') ) and then run rm -rf
"$PY_SITE/modelopt" and cp -r /workspace/Model-Optimizer/modelopt "$PY_SITE"/
and keep the mkdir -p /workspace/datasets line—update the cell that contains the
rm, cp and mkdir commands to use the PY_SITE variable instead of the 3.12
hardcoded path.

In `@examples/pruning_demo/README.md`:
- Around line 683-685: The README.md file is missing a trailing newline; add a
single newline character at the end of the file (after the final line containing
the note about Minitron models and baseline) so the file ends with exactly one
newline to satisfy MD047.
- Around line 320-357: The fenced code block that starts with the line "block_0:
attention  kv_heads_8    ffn  intermediate_12288" should include a language
identifier to enable proper highlighting; update the opening triple-backtick for
that block (the block showing block_0...block_35) to use a language token such
as ```text (or ```plain) so the README's architecture details code block is
marked correctly.
- Around line 5-21: Fix the markdown link text spacing in the Table of Contents
by removing the extra spaces inside the square brackets for the affected
entries: change "1.[ Introduction](`#1-introduction`)" to "1.
[Introduction](`#1-introduction`)" and "10.[ References](`#10-references`)" to "10.
[References](`#10-references`) so the link text has no leading/trailing spaces and
spacing after the list number is consistent; verify similar entries follow the
same "N. [Text](`#anchor`)" pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0c466b7c-b784-496d-802c-dfe00cad04ea

📥 Commits

Reviewing files that changed from the base of the PR and between c417e6f and beced98.

⛔ Files ignored due to path filters (6)
  • examples/pruning_demo/all_curves_throughput_vs_latency.png is excluded by !**/*.png
  • examples/pruning_demo/distillation_curves.png is excluded by !**/*.png
  • examples/pruning_demo/distillation_loss_7B.png is excluded by !**/*.png
  • examples/pruning_demo/memory_sweep.png is excluded by !**/*.png
  • examples/pruning_demo/memory_sweep_combined.png is excluded by !**/*.png
  • examples/pruning_demo/summary_chart.png is excluded by !**/*.png
📒 Files selected for processing (7)
  • examples/pruning_demo/00_prerequisites.ipynb
  • examples/pruning_demo/README.md
  • examples/pruning_demo/advanced_compression_experiments.md
  • examples/pruning_demo/scenario1_minitron.ipynb
  • examples/pruning_demo/scenario1_puzzletron.ipynb
  • examples/pruning_demo/scenario2_minitron.ipynb
  • examples/pruning_demo/scenario2_puzzletron.ipynb

Comment thread examples/pruning/minitron_vs_puzzletron/scenario1_minitron.ipynb
Comment thread examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb
Comment thread examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb
Comment thread examples/pruning/minitron_vs_puzzletron/scenario2_minitron.ipynb
Comment thread examples/pruning/minitron_vs_puzzletron/scenario2_puzzletron.ipynb
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.37%. Comparing base (f34f488) to head (1179fa2).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1320      +/-   ##
==========================================
+ Coverage   76.73%   77.37%   +0.64%     
==========================================
  Files         476      476              
  Lines       51306    51306              
==========================================
+ Hits        39369    39698     +329     
+ Misses      11937    11608     -329     
Flag Coverage Δ
examples 41.80% <ø> (+2.62%) ⬆️
unit 52.53% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@achidiac-nv achidiac-nv self-assigned this Apr 22, 2026
Comment thread examples/pruning/demo/00_prerequisites.ipynb Outdated
…lint issues

Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
@achidiac-nv achidiac-nv requested a review from a team as a code owner April 22, 2026 16:49
@achidiac-nv achidiac-nv changed the title Add pruning_demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/ with README and notebooks Add pruning/demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/ with README and notebooks Apr 22, 2026
@achidiac-nv achidiac-nv changed the title Add pruning/demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/ with README and notebooks Add demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/pruning/ with README and notebooks Apr 22, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
examples/pruning/demo/scenario1_puzzletron.ipynb (1)

136-140: ⚠️ Potential issue | 🟠 Major

Remove site-packages monkey patch for lm_eval (Line 139).

Editing /usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py is fragile and environment-specific; examples/llm_eval/lm_eval_hf.py already warns when the version is not 0.4.8 instead of requiring file mutation.

🔧 Suggested fix
 !sed -i 's/"torch\\.bfloat16"/"bfloat16"/g' \
     /workspace/puzzle_dir/mip/puzzle_solutions/target_memory_130000MiB-num_params_7G/solutions--checkpoints/solution_0/config.json
 
-!sed -i '1s/^/__version__ = "0.4.8"\\n/' /usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py
-
 !cd /workspace/Model-Optimizer && \
 python examples/llm_eval/lm_eval_hf.py \
#!/bin/bash
set -euo pipefail

python - <<'PY'
import json
p = "examples/pruning/demo/scenario1_puzzletron.ipynb"
nb = json.load(open(p))
for i, c in enumerate(nb["cells"]):
    if c.get("cell_type") == "code":
        s = "".join(c.get("source", []))
        if "lm_eval/__init__.py" in s and "__version__" in s:
            print(f"{p} -> cell {i} contains lm_eval monkey patch:")
            print(s)
PY

echo
echo "Version-check behavior in lm_eval_hf.py:"
rg -n 'if not lm_eval\.__version__\.startswith\("0\.4\.8"\)|warnings\.warn' examples/llm_eval/lm_eval_hf.py -C 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/demo/scenario1_puzzletron.ipynb` around lines 136 - 140,
Remove the fragile site-packages monkey-patch that inserts "__version__ =
\"0.4.8\"" into lm_eval/__init__.py; locate the notebook cell in
scenario1_puzzletron.ipynb that runs the sed command string "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/'
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py" and delete that
command (and any related sed edits for lm_eval), relying on the existing
version-check/warning in examples/llm_eval/lm_eval_hf.py instead.
🧹 Nitpick comments (1)
examples/pruning/demo/README.md (1)

170-173: Avoid token-in-CLI examples for authentication (Line 171).

Using --token <your_token> in docs encourages secrets ending up in shell history. Prefer interactive login or env-var based usage.

🔧 Suggested doc update
-hf auth login --token <your_token>
+hf auth login
+# or:
+# export HF_TOKEN=...
+# hf auth login --token "$HF_TOKEN"

As per coding guidelines, "Never hardcode secrets, credentials, tokens, passwords, or API keys in source code. Use environment variables or configuration files listed in .gitignore to store sensitive information."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/demo/README.md` around lines 170 - 173, The README currently
shows a CLI example using the explicit flag "hf auth login --token
<your_token>", which risks exposing secrets; update the example to use an
interactive login or environment-variable approach instead (e.g., instruct users
to run "hf auth login" interactively or to set HF_TOKEN in their environment and
call "hf download Qwen/Qwen3-8B --local-dir /workspace/models/Qwen3-8B" without
embedding the token). Replace the inline token usage in the example and add a
short note advising storing tokens in environment variables or a .env/config
file excluded from VCS per the project's secret-handling guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/pruning/demo/README.md`:
- Line 5: The README has remaining markdownlint issues: fix the malformed
list/link "1.[ Introduction](`#1-introduction`)" by normalizing it to a proper
list or heading syntax, add or remove blank lines around headings and lists to
satisfy MD022 (ensure headings are surrounded by blank lines), collapse or
remove extra blank lines to address MD039, and ensure the file ends with a
single newline to resolve MD047; run markdownlint or your project's pre-commit
linter after updating README.md to confirm all warnings are cleared.

In `@examples/pruning/demo/scenario2_puzzletron.ipynb`:
- Around line 269-273: Remove the sed command that mutates
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py (the "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/' ..." cell) and any other notebook cell that
patches lm_eval; instead pin the package to lm-eval==0.4.8 in your environment
(requirements, pip install, or container image) so the version check in
examples/llm_eval/lm_eval_hf.py (the warning at line 47) can operate as
intended. Ensure no in-place edits to lm_eval/__init__.py remain in the
notebook.

---

Duplicate comments:
In `@examples/pruning/demo/scenario1_puzzletron.ipynb`:
- Around line 136-140: Remove the fragile site-packages monkey-patch that
inserts "__version__ = \"0.4.8\"" into lm_eval/__init__.py; locate the notebook
cell in scenario1_puzzletron.ipynb that runs the sed command string "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/'
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py" and delete that
command (and any related sed edits for lm_eval), relying on the existing
version-check/warning in examples/llm_eval/lm_eval_hf.py instead.

---

Nitpick comments:
In `@examples/pruning/demo/README.md`:
- Around line 170-173: The README currently shows a CLI example using the
explicit flag "hf auth login --token <your_token>", which risks exposing
secrets; update the example to use an interactive login or environment-variable
approach instead (e.g., instruct users to run "hf auth login" interactively or
to set HF_TOKEN in their environment and call "hf download Qwen/Qwen3-8B
--local-dir /workspace/models/Qwen3-8B" without embedding the token). Replace
the inline token usage in the example and add a short note advising storing
tokens in environment variables or a .env/config file excluded from VCS per the
project's secret-handling guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 67767531-5494-4153-9d9d-d17c53732c87

📥 Commits

Reviewing files that changed from the base of the PR and between c417e6f and 8403a1d.

⛔ Files ignored due to path filters (6)
  • examples/pruning/demo/all_curves_throughput_vs_latency.png is excluded by !**/*.png
  • examples/pruning/demo/distillation_curves.png is excluded by !**/*.png
  • examples/pruning/demo/distillation_loss_7B.png is excluded by !**/*.png
  • examples/pruning/demo/memory_sweep.png is excluded by !**/*.png
  • examples/pruning/demo/memory_sweep_combined.png is excluded by !**/*.png
  • examples/pruning/demo/summary_chart.png is excluded by !**/*.png
📒 Files selected for processing (7)
  • examples/pruning/demo/00_prerequisites.ipynb
  • examples/pruning/demo/README.md
  • examples/pruning/demo/advanced_compression_experiments.md
  • examples/pruning/demo/scenario1_minitron.ipynb
  • examples/pruning/demo/scenario1_puzzletron.ipynb
  • examples/pruning/demo/scenario2_minitron.ipynb
  • examples/pruning/demo/scenario2_puzzletron.ipynb

Comment thread examples/pruning/demo/README.md Outdated
Comment thread examples/pruning/demo/scenario2_puzzletron.ipynb Outdated
Comment thread examples/pruning/demo/00_prerequisites.ipynb Outdated
Comment thread examples/pruning/demo/README.md Outdated
Comment thread examples/pruning/demo/scenario1_puzzletron.ipynb Outdated
Comment thread examples/pruning/demo/scenario1_puzzletron.ipynb Outdated
Comment thread examples/pruning/demo/scenario2_minitron.ipynb
Comment thread examples/pruning/demo/scenario2_minitron.ipynb Outdated
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
examples/pruning/demo/README.md (1)

5-21: ⚠️ Potential issue | 🟡 Minor

Fix TOC list formatting for consistent Markdown rendering.

Line 5 and Line 21 are missing a space after the numeric list marker (1. / 10.), which breaks standard ordered-list formatting.

✏️ Suggested fix
-1.[Introduction](`#1-introduction`)
+1. [Introduction](`#1-introduction`)
...
-10.[References](`#10-references`)
+10. [References](`#10-references`)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/demo/README.md` around lines 5 - 21, The table-of-contents
ordered list items "1.[Introduction]" and "10.[References]" are missing a space
after the numeric marker which breaks Markdown rendering; edit the TOC entries
(look for the strings "1.[Introduction]" and "10.[References]") to insert a
space after the period (e.g., "1. [Introduction]" and "10. [References]") and
scan the other numbered list entries in that block to ensure all ordered markers
follow the same "N. Item" spacing for consistent Markdown formatting.
🧹 Nitpick comments (1)
examples/pruning/demo/README.md (1)

171-174: Avoid inline token patterns in auth instructions.

Using hf auth login --token <your_token> encourages token-in-command usage (shell history/log risk). Prefer interactive login or env-var-based token usage in docs.

🔐 Suggested tweak
-hf auth login --token <your_token>
+hf auth login
+# or (non-interactive):
+# hf auth login --token "$HF_TOKEN"

As per coding guidelines: “Never hardcode secrets, credentials, tokens, passwords, or API keys in source code. Use environment variables or configuration files listed in .gitignore to store sensitive information.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/demo/README.md` around lines 171 - 174, Replace the
inline-token pattern shown in the README snippet ("hf auth login --token
<your_token>") with a secure alternative: remove examples that put tokens
directly in commands and instead instruct users to either use interactive login
(e.g., run the CLI without a token to be prompted) or set their token via an
environment variable (e.g., export HF_TOKEN=...) or a credentials file, then run
the download command ("hf download Qwen/Qwen3-8B --local-dir
/workspace/models/Qwen3-8B") normally; update the README guidance accordingly so
tokens are never shown inline or hardcoded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/pruning/demo/scenario2_puzzletron.ipynb`:
- Around line 436-441: The cell currently pipes the sweep run through `tee` into
`grep "Puzzletron Progress"`, coupling cell exit status to the grep match;
instead run the sweep command and capture full output to
/workspace/puzzletron_sweep.log via `tee` without piping into `grep`, then
perform any `grep "Puzzletron Progress"` as a separate non-blocking step (or as
a follow-up cell) so the exit status reflects the actual run of
examples/puzzletron/main.py (the --mip-only invocation using the
qwen3_8b_pruneffn_memory config) and not whether the log contained the string.

---

Duplicate comments:
In `@examples/pruning/demo/README.md`:
- Around line 5-21: The table-of-contents ordered list items "1.[Introduction]"
and "10.[References]" are missing a space after the numeric marker which breaks
Markdown rendering; edit the TOC entries (look for the strings
"1.[Introduction]" and "10.[References]") to insert a space after the period
(e.g., "1. [Introduction]" and "10. [References]") and scan the other numbered
list entries in that block to ensure all ordered markers follow the same "N.
Item" spacing for consistent Markdown formatting.

---

Nitpick comments:
In `@examples/pruning/demo/README.md`:
- Around line 171-174: Replace the inline-token pattern shown in the README
snippet ("hf auth login --token <your_token>") with a secure alternative: remove
examples that put tokens directly in commands and instead instruct users to
either use interactive login (e.g., run the CLI without a token to be prompted)
or set their token via an environment variable (e.g., export HF_TOKEN=...) or a
credentials file, then run the download command ("hf download Qwen/Qwen3-8B
--local-dir /workspace/models/Qwen3-8B") normally; update the README guidance
accordingly so tokens are never shown inline or hardcoded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b892f30b-4096-418c-ae84-79b6420e9dc3

📥 Commits

Reviewing files that changed from the base of the PR and between d1169d3 and 8ee6a65.

📒 Files selected for processing (7)
  • examples/pruning/demo/00_prerequisites.ipynb
  • examples/pruning/demo/README.md
  • examples/pruning/demo/advanced_compression_experiments.md
  • examples/pruning/demo/scenario1_minitron.ipynb
  • examples/pruning/demo/scenario1_puzzletron.ipynb
  • examples/pruning/demo/scenario2_minitron.ipynb
  • examples/pruning/demo/scenario2_puzzletron.ipynb
✅ Files skipped from review due to trivial changes (1)
  • examples/pruning/demo/00_prerequisites.ipynb

Comment thread examples/pruning/demo/scenario2_puzzletron.ipynb Outdated
Comment thread examples/pruning/demo/00_prerequisites.ipynb Outdated
Comment thread examples/pruning/demo/README.md Outdated
Comment thread examples/pruning/demo/README.md Outdated
Comment thread examples/pruning/demo/README.md Outdated
Comment thread examples/pruning/demo/scenario1_minitron.ipynb Outdated
Comment thread examples/pruning/demo/scenario1_minitron.ipynb Outdated
Comment thread examples/pruning/demo/scenario1_puzzletron.ipynb Outdated
Comment thread examples/pruning/minitron_vs_puzzletron/scenario2_minitron.ipynb
Comment thread examples/pruning/minitron_vs_puzzletron/scenario2_puzzletron.ipynb
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb (1)

45-58: ⚡ Quick win

Avoid mutating checked-in YAML configs in-place during notebook runs.

Lines 45–58 edit files under /opt/Model-Optimizer/..., which can leave the mounted repo dirty and make later runs depend on prior notebook state. Copy config(s) to /workspace and edit/run from the copied paths instead.

♻️ Suggested adjustment
-!sed -i 's|input_hf_model_path: .*|input_hf_model_path: /workspace/models/Qwen3-8B|' \
-    /opt/Model-Optimizer/examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml
+!cp /opt/Model-Optimizer/examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml \
+    /workspace/qwen3_8b_pruneffn_memory.scenario1.yaml
+!cp /opt/Model-Optimizer/examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b.yaml \
+    /workspace/qwen3_8b.scenario1.yaml
+!sed -i 's|input_hf_model_path: .*|input_hf_model_path: /workspace/models/Qwen3-8B|' \
+    /workspace/qwen3_8b_pruneffn_memory.scenario1.yaml

Then point --config to /workspace/qwen3_8b_pruneffn_memory.scenario1.yaml.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb` around
lines 45 - 58, The notebook currently mutates checked-in YAMLs using sed -i on
/opt/Model-Optimizer/examples/puzzletron/configs/... which can dirty the mounted
repo; instead copy the needed configs to /workspace (e.g., copy
qwen3_8b_pruneffn_memory.yaml and qwen3_8b.yaml to
/workspace/qwen3_8b_pruneffn_memory.scenario1.yaml), run your sed replacements
against those copied files (avoid sed -i on the repo paths), and point any
invocation that uses --config to the new
/workspace/qwen3_8b_pruneffn_memory.scenario1.yaml so all edits are local to the
workspace and do not modify the mounted repository.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/pruning/minitron_vs_puzzletron/advanced_compression_experiments.md`:
- Around line 152-163: The MMLU scores in the markdown table currently use
percent-style numbers (e.g., 78.6) while earlier sections use decimals (e.g.,
0.7493); update the table header "MMLU" to "MMLU (%)" or convert all table
entries (Model rows such as "Nemotron-Nano-12B-v2", "Minitron 10B", "Puzzletron
10B", "Minitron 34k", "Puzzletron 34k") to decimal format to match Section 4.1,
and ensure the bolded values follow the same unit convention for consistency
across the document.

In `@examples/pruning/minitron_vs_puzzletron/README.md`:
- Around line 86-87: The README currently contradicts itself about Minitron's
memory-budget support: reconcile the two statements by either removing or
clarifying the claim in the opening description (the paragraph mentioning
"Minitron applies homogeneous pruning" and "direct memory-budget targeting is
now supported") so it matches the later note at lines ~412–413; if memory-budget
targeting is supported, update the later text in Sections 4/5 and the decision
framework to explain its scope and limitations (e.g., whether it maps memory
target to parameter count or uses an internal memory-aware heuristic); if it is
not supported, remove the earlier memory-budget phrase and ensure all references
to "memory-budget targeting" are eliminated or marked as TODO. Reference:
Minitron, memory-budget targeting, Sections 4/5, and the opening description
paragraph to ensure consistency.
- Line 1: The README title string incorrectly says "Reduce Your LLM Size and
Efficiency" which implies lowering efficiency; update the title text to convey
size reduction while improving efficiency — e.g., change the title to "Reduce
Your LLM Size and Improve Efficiency with NVIDIA Model-Optimizer: A Pruning &
Distillation Guide" so the intent is clear; locate and edit the title line in
README.md (the first line containing the current heading) and replace the phrase
"and Efficiency" with "and Improve Efficiency" or similar positive wording.

In `@examples/pruning/minitron_vs_puzzletron/scenario2_puzzletron.ipynb`:
- Around line 418-423: The cell currently pipes torchrun output through tee into
grep which makes the notebook cell fail if the string "Puzzletron Progress" is
not printed; change the pipeline so torchrun output is only written with tee to
/workspace/puzzletron_sweep.log (remove the trailing | grep "Puzzletron
Progress") and then add a separate, optional follow-up inspection step that runs
grep against /workspace/puzzletron_sweep.log to check for "Puzzletron Progress";
locate the cell that invokes torchrun --nproc_per_node 1
examples/puzzletron/main.py ... --mip-only and modify that line to stop piping
into grep and instead write only with tee, then add a separate grep command to
inspect the log.

---

Nitpick comments:
In `@examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb`:
- Around line 45-58: The notebook currently mutates checked-in YAMLs using sed
-i on /opt/Model-Optimizer/examples/puzzletron/configs/... which can dirty the
mounted repo; instead copy the needed configs to /workspace (e.g., copy
qwen3_8b_pruneffn_memory.yaml and qwen3_8b.yaml to
/workspace/qwen3_8b_pruneffn_memory.scenario1.yaml), run your sed replacements
against those copied files (avoid sed -i on the repo paths), and point any
invocation that uses --config to the new
/workspace/qwen3_8b_pruneffn_memory.scenario1.yaml so all edits are local to the
workspace and do not modify the mounted repository.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f58449e7-aaea-40a4-8f48-76ee32828a1f

📥 Commits

Reviewing files that changed from the base of the PR and between 8ee6a65 and de51d09.

⛔ Files ignored due to path filters (6)
  • examples/pruning/minitron_vs_puzzletron/figures/all_curves_throughput_vs_latency.png is excluded by !**/*.png
  • examples/pruning/minitron_vs_puzzletron/figures/distillation_curves.png is excluded by !**/*.png
  • examples/pruning/minitron_vs_puzzletron/figures/distillation_loss_7B.png is excluded by !**/*.png
  • examples/pruning/minitron_vs_puzzletron/figures/memory_sweep.png is excluded by !**/*.png
  • examples/pruning/minitron_vs_puzzletron/figures/memory_sweep_combined.png is excluded by !**/*.png
  • examples/pruning/minitron_vs_puzzletron/figures/summary_chart.png is excluded by !**/*.png
📒 Files selected for processing (7)
  • examples/pruning/minitron_vs_puzzletron/00_prerequisites.ipynb
  • examples/pruning/minitron_vs_puzzletron/README.md
  • examples/pruning/minitron_vs_puzzletron/advanced_compression_experiments.md
  • examples/pruning/minitron_vs_puzzletron/scenario1_minitron.ipynb
  • examples/pruning/minitron_vs_puzzletron/scenario1_puzzletron.ipynb
  • examples/pruning/minitron_vs_puzzletron/scenario2_minitron.ipynb
  • examples/pruning/minitron_vs_puzzletron/scenario2_puzzletron.ipynb
✅ Files skipped from review due to trivial changes (2)
  • examples/pruning/minitron_vs_puzzletron/00_prerequisites.ipynb
  • examples/pruning/minitron_vs_puzzletron/scenario1_minitron.ipynb

Comment thread examples/pruning/minitron_vs_puzzletron/README.md Outdated
Comment thread examples/pruning/minitron_vs_puzzletron/README.md Outdated
Comment thread examples/pruning/minitron_vs_puzzletron/scenario2_puzzletron.ipynb
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/pruning/minitron_vs_puzzletron/README.md`:
- Line 34: The sentence describing MMLU uses redundant phrasing ("a 4-choice
multiple choice problem"); update the wording around "MMLU (Massive Multitask
Language Understanding)" to something tighter and consistent such as "a 4-choice
multiple-choice question" (or "4-choice multiple-choice problem") so the phrase
is not duplicated—locate the MMLU sentence and replace the redundant fragment
accordingly.
- Around line 125-128: The README currently suggests using "chmod -R 777
${MODELOPT_DIR}", which makes the repo world-writable; instead instruct to fix
ownership or use least-privilege perms: update the instruction to set ownership
to the container user (e.g., mention running chown -R ${HOST_UID}:${HOST_GID}
${MODELOPT_DIR} or use chmod -R 770/755 as appropriate) and add a short note
explaining the root cause (container UID/GID mismatch) and when/how to adjust
HOST_UID/HOST_GID before mounting so the container can write without 777.
- Around line 180-182: Replace the unsafe inline token usage shown as "hf auth
login --token <your_token>" with a safer alternative: show the interactive login
command "hf auth login" (no token on the CLI) and/or demonstrate using an
environment variable (e.g., export HF_TOKEN and then call hf auth login --token
"$HF_TOKEN" from a script or CI) so credentials are not exposed in shell history
or process listings; update the README examples around the hf auth/login command
and keep the hf download example unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 12b821ba-b6af-4f18-8c48-7ba6b4787ba0

📥 Commits

Reviewing files that changed from the base of the PR and between de51d09 and 55c2091.

📒 Files selected for processing (1)
  • examples/pruning/minitron_vs_puzzletron/README.md

Comment thread examples/pruning/minitron_vs_puzzletron/README.md
Comment thread examples/pruning/minitron_vs_puzzletron/README.md
Comment thread examples/pruning/minitron_vs_puzzletron/README.md
Comment thread examples/pruning/minitron_vs_puzzletron/README.md

To validate that Minitron is the right choice for this scenario, we also ran Puzzletron at the same ~7B parameter target. Puzzletron produces a 36-layer heterogeneous model with variable FFN widths per layer (some as low as 2560) and selective attention removal.

▶ See notebook [`scenario1_puzzletron.ipynb`](scenario1_puzzletron.ipynb) to reproduce this run.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This icon in markdown generally means collapsible section you click to expand more details and may be confusing. Can we use some other icon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants