add high precision init weights to fully_shard example by pstjohn · Pull Request #2785 · NVIDIA/TransformerEngine

pstjohn · 2026-03-20T19:45:14Z

WIP, would like to also add tests around preserve_high_precision_init_val

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

jomitchellnv · 2026-03-30T20:31:41Z

examples/pytorch/quantized_model_init/fully_shard.py

+    # By default, FusedAdam initializes master weights by dequantizing
+    # the FP8 parameters, which introduces quantization noise.  Instead,
+    # we seed them from the original BF16 init values preserved in step 2.
+    for param in model.parameters():


Do we know if params_dtype=torch.float32 would fix this issue and not require us to manually retrieve BF16 weights and manually cast them to FP32?

jomitchellnv · 2026-03-30T20:36:41Z

examples/pytorch/quantized_model_init/fully_shard.py

-    # Load checkpoint back. Provide empty state dict containers with the
+    # Load checkpoint back.  Provide empty state dict containers with the
    # same structure; DCP fills them from the saved files.
    state_to_load = {"model": model.state_dict(), "optimizer": optimizer.state_dict()}


Is it a requirement to store the model weights? If the optimizer weights hold our FP32 master weights, and we simply quantize those to build our FP4/FP8 weights, then why do we need to store the model weights (quantized in FP4/FP8)?

jomitchellnv · 2026-03-30T20:37:46Z

examples/pytorch/quantized_model_init/fully_shard.py

        for key, value in full_model_state.items():
            if key in opt_param_states and "master_param" in opt_param_states[key]:
-                # Prefer optimizer's FP32 master weight (maintained throughout training).
+                # Prefer optimizer's FP32 master weight.


Can we put this into a utility function? So we dont have to manually rip out the parts from the optimizer?

add high precision init weights to fully_shard example

86815f9

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

jomitchellnv reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add high precision init weights to fully_shard example#2785

add high precision init weights to fully_shard example#2785
pstjohn wants to merge 1 commit intoNVIDIA:mainfrom
pstjohn:worktree-pstjohn/clean-up-example

pstjohn commented Mar 20, 2026

Uh oh!

jomitchellnv Mar 30, 2026

Uh oh!

jomitchellnv Mar 30, 2026

Uh oh!

jomitchellnv Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pstjohn commented Mar 20, 2026

Uh oh!

jomitchellnv Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jomitchellnv Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jomitchellnv Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants