Skip to content

Comments

Benchmark: Model benchmark - deterministic training support#731

Open
Aishwarya-Tonpe wants to merge 1 commit intomainfrom
aishwaryatonpe/deterministic-training
Open

Benchmark: Model benchmark - deterministic training support#731
Aishwarya-Tonpe wants to merge 1 commit intomainfrom
aishwaryatonpe/deterministic-training

Conversation

@Aishwarya-Tonpe
Copy link

@Aishwarya-Tonpe Aishwarya-Tonpe commented Aug 28, 2025

Adds opt-in deterministic training mode to SuperBench's PyTorch model benchmarks. When enabled --enable-determinism. PyTorch deterministic algorithms are enforced, and per-step numerical fingerprints (loss, activation means) are recorded as metrics. These can be compared across runs using the existing sb result diagnosis pipeline to verify bit-exact reproducibility — useful for hardware validation and platform comparison.

Flags added -

--enable-determinism
--check-frequency: Number of steps after which you want the metrics to be recorded
--deterministic-seed

Changes -

Updated pytorch_base.py to handle deterministic settings, logging.
Added a new example script: pytorch_deterministic_example.py
Added a test file: test_pytorch_determinism_all.py to verify everything works as expected.

Usage -

Step 1: Run 1 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file
Step 2: Generate the baseline file from the Run 1 results using - sb result generate-baseline
Step 3: Run 2 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file on a different machine (or the same machine)
Step 4: Run diagnosis on the results generated from the 2 runs using the - sb result diagnosis command

Note -

  1. Make sure all the parameters are constant between the 2 runs
  2. Running the diagnosis command requires the rules.yaml file

@Aishwarya-Tonpe Aishwarya-Tonpe requested a review from a team as a code owner August 28, 2025 17:41
@Aishwarya-Tonpe
Copy link
Author

@Aishwarya-Tonpe please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

@codecov
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 83.65019% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.68%. Comparing base (575859b) to head (2b52174).

Files with missing lines Patch % Lines
...rbench/benchmarks/model_benchmarks/pytorch_base.py 84.54% 17 Missing ⚠️
superbench/common/model_log_utils.py 76.74% 10 Missing ⚠️
superbench/analyzer/baseline_generation.py 52.94% 8 Missing ⚠️
...enchmarks/model_benchmarks/pytorch_mixtral_impl.py 82.85% 6 Missing ⚠️
...perbench/benchmarks/model_benchmarks/model_base.py 66.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #731      +/-   ##
==========================================
- Coverage   85.70%   85.68%   -0.03%     
==========================================
  Files         102      103       +1     
  Lines        7703     7886     +183     
==========================================
+ Hits         6602     6757     +155     
- Misses       1101     1129      +28     
Flag Coverage Δ
cpu-python3.10-unit-test 70.40% <41.60%> (-0.56%) ⬇️
cpu-python3.12-unit-test 70.40% <41.60%> (-0.56%) ⬇️
cpu-python3.7-unit-test 69.83% <39.92%> (-0.61%) ⬇️
cuda-unit-test 83.59% <82.44%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@guoshzhao guoshzhao changed the title Aishwaryatonpe/deterministic training Benchmark: Model benchmark - deterministic training support Sep 18, 2025
@guoshzhao guoshzhao requested a review from polarG September 24, 2025 23:25
@guoshzhao guoshzhao mentioned this pull request Oct 2, 2025
30 tasks
@guoshzhao
Copy link
Contributor

Thanks for addressing all the comments, since this is a big PR, could we do an apple-2-apple comparision before merging this PR. For example,

  1. Run all e2e model benchmark based on main branch.
  2. Run all e2e model benchmark based on this branch with deterministic training disabled.
  3. Run all e2e model benchmark based on this branch with deterministic training enabled.
    And compare if throughput metrics are expected?

@Aishwarya-Tonpe
Copy link
Author

Thanks for addressing all the comments, since this is a big PR, could we do an apple-2-apple comparision before merging this PR. For example,

  1. Run all e2e model benchmark based on main branch.
  2. Run all e2e model benchmark based on this branch with deterministic training disabled.
  3. Run all e2e model benchmark based on this branch with deterministic training enabled.
    And compare if throughput metrics are expected?

Tested and compared all the 3 items listed above. Looks good.
Can share the result files if needed, please lmk. thank you!

@guoshzhao guoshzhao added benchmarks SuperBench Benchmarks model-benchmarks Model Benchmark Test for SuperBench Benchmarks labels Oct 17, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from 4c1ebf8 to dd0457d Compare February 18, 2026 20:49
@microsoft microsoft deleted a comment from Copilot AI Feb 18, 2026
@microsoft microsoft deleted a comment from Copilot AI Feb 18, 2026
Copilot AI review requested due to automatic review settings February 18, 2026 23:05
@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from dd0457d to f2c7554 Compare February 18, 2026 23:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from f2c7554 to f831f73 Compare February 19, 2026 21:55
Copilot AI review requested due to automatic review settings February 19, 2026 22:33
@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from f831f73 to 840c62f Compare February 19, 2026 22:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from 840c62f to 181b9ad Compare February 19, 2026 23:15
Copilot AI review requested due to automatic review settings February 19, 2026 23:57
@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from 181b9ad to 20c1fac Compare February 19, 2026 23:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from 20c1fac to 2803619 Compare February 20, 2026 00:31
Copilot AI review requested due to automatic review settings February 20, 2026 00:41
@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from 2803619 to 34689f9 Compare February 20, 2026 00:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from 34689f9 to c163ddb Compare February 20, 2026 17:49
Copilot AI review requested due to automatic review settings February 20, 2026 19:39
@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from c163ddb to b5ad62a Compare February 20, 2026 19:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from b5ad62a to a6ce77c Compare February 20, 2026 21:56
Copilot AI review requested due to automatic review settings February 20, 2026 23:54
@Aishwarya-Tonpe Aishwarya-Tonpe force-pushed the aishwaryatonpe/deterministic-training branch from a6ce77c to 2b52174 Compare February 20, 2026 23:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarks SuperBench Benchmarks model-benchmarks Model Benchmark Test for SuperBench Benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants