Skip to content

2.1.0 rc rebase1#1673

Merged
ktangsali merged 18 commits into
mainfrom
2.1.0-rc-rebase1
May 27, 2026
Merged

2.1.0 rc rebase1#1673
ktangsali merged 18 commits into
mainfrom
2.1.0-rc-rebase1

Conversation

@ktangsali
Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

Rebase RC into main

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ktangsali
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@ktangsali
Copy link
Copy Markdown
Collaborator Author

Blossom CI passes (see report here) and GitHub CI passed too.

ktangsali and others added 17 commits May 27, 2026 01:51
* add fixes for the nvfuser bug

* test(natten): narrow CPU-backward skip to FlexAttention NotImplementedError

Cherry-picked test/nn/functional/test_natten.py from upstream commit
7f2451a ("Ci deps group (#1634)"). The previous device == "cpu"
early-skip was too broad; this wraps the forward call and only skips on
the specific NotImplementedError raised by FlexAttention's CPU-backward
guard. If natten picks a different backend (or FlexAttention ever
supports CPU backward), the test will run.

* black formatting

---------

Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>
…1642)

* add missing dependencies for examples

* Replace np.infty with np.inf based on new API

* add some more missing dependencies

* update use of FusedAdam with native torch's Adam

* add tensorboard deps

* fix ci issues
* bump up package versions to fix cves

* fix greptile comments

* update
* fix cve in uv
* few-more-security-fixes
* Refactor weight initialization to use PyTorch's trunc_normal_ directly

- Updated internal weight initialization in distributed AFNO layers and EarthAttention blocks to utilize `torch.nn.init.trunc_normal_` instead of legacy implementations.
- Deprecated `trunc_normal_` wrapper in `physicsnemo.nn.module.utils` and removed the in-tree legacy implementation.
- Regenerated forward-accuracy reference outputs for several models to align with the new initialization method.
- Updated tests to skip on PyTorch versions below 2.12 due to changes in RNG algorithms affecting output consistency.

* fix doctest for dit layers

---------

Co-authored-by: Kaustubh Tangsali <ktangsali@nvidia.com>
Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
* Update shard tensor ring attention to use the expected default ring attention results directory

* Update examples/minimal/ShardTensorExamples/6_ring_attention/README.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update examples/minimal/ShardTensorExamples/6_ring_attention/benchmark_sharded_attention.py

Co-authored-by: Negin Sobhani <negin513@gmail.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Negin Sobhani <negin513@gmail.com>
Prune residual accepted-point conflicts before returning Warp mesh Poisson samples.
…sion (#1658)

* Update tensordict dependency constraints and add regression tests for Mesh under torch.compile

- Adjusted the tensordict dependency in pyproject.toml to be upper-bounded due to regressions in version 0.12.x, with a note to drop the upper bound once the related PR is merged.
- Introduced a new test file for regression testing of the Mesh class to ensure compatibility with torch.compile, specifically addressing issues caused by the tensordict 0.12.x changes. The tests validate that cached properties and data fields behave correctly when compiled.

* Update CHANGELOG and bump mlflow and starlette versions

- Added a new entry in CHANGELOG detailing the fix for constructing a Mesh inside a torch.compile-traced function, addressing regressions from tensordict 0.12.0.
- Updated the mlflow and starlette package versions to 3.12.0 and 0.52.1 respectively, along with their corresponding source distribution and wheel URLs.
- Adjusted tensordict dependency constraints to ensure compatibility with the latest changes.

* format
* gnn recipes bug fixes

* minor fixes
@ktangsali
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@ktangsali ktangsali merged commit e579a9f into main May 27, 2026
4 checks passed
@ktangsali ktangsali deleted the 2.1.0-rc-rebase1 branch May 27, 2026 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants