add high precision init weights to fully_shard example#2785
Draft
pstjohn wants to merge 1 commit intoNVIDIA:mainfrom
Draft
add high precision init weights to fully_shard example#2785pstjohn wants to merge 1 commit intoNVIDIA:mainfrom
pstjohn wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
| # By default, FusedAdam initializes master weights by dequantizing | ||
| # the FP8 parameters, which introduces quantization noise. Instead, | ||
| # we seed them from the original BF16 init values preserved in step 2. | ||
| for param in model.parameters(): |
Contributor
There was a problem hiding this comment.
Do we know if params_dtype=torch.float32 would fix this issue and not require us to manually retrieve BF16 weights and manually cast them to FP32?
| # Load checkpoint back. Provide empty state dict containers with the | ||
| # Load checkpoint back. Provide empty state dict containers with the | ||
| # same structure; DCP fills them from the saved files. | ||
| state_to_load = {"model": model.state_dict(), "optimizer": optimizer.state_dict()} |
Contributor
There was a problem hiding this comment.
Is it a requirement to store the model weights? If the optimizer weights hold our FP32 master weights, and we simply quantize those to build our FP4/FP8 weights, then why do we need to store the model weights (quantized in FP4/FP8)?
| for key, value in full_model_state.items(): | ||
| if key in opt_param_states and "master_param" in opt_param_states[key]: | ||
| # Prefer optimizer's FP32 master weight (maintained throughout training). | ||
| # Prefer optimizer's FP32 master weight. |
Contributor
There was a problem hiding this comment.
Can we put this into a utility function? So we dont have to manually rip out the parts from the optimizer?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WIP, would like to also add tests around
preserve_high_precision_init_val