Parallelizes MDAnalysis.analysis.msd#4896
Parallelizes MDAnalysis.analysis.msd#4896tanishy7777 wants to merge 12 commits intoMDAnalysis:developfrom
MDAnalysis.analysis.msd#4896Conversation
|
Hello @tanishy7777! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2025-01-20 21:03:12 UTC |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4896 +/- ##
===========================================
- Coverage 93.42% 93.41% -0.01%
===========================================
Files 177 189 +12
Lines 21865 22945 +1080
Branches 3079 3079
===========================================
+ Hits 20427 21435 +1008
- Misses 986 1059 +73
+ Partials 452 451 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks for your work. I'm currently quite busy, so might not be able to review in the next few days. Please be patient. |
|
@talagayev / @marinegor can you have a look at this PR, please? |
|
Checked the code and ran locally, looks all good. Here @tanishy7777 you could also add the From my side it looks good, good job @tanishy7777 :) |
Thanks a lot for reviewing my PR, will wait for the suggestions as you mentioned. |
Also could you please review this PR #4884 its pretty similar or tell me if it needs any more work to be done. Thanks again |
Hey @tanishy7777, yes the PR is similar, I can take a look at it as well. |
hmacdope
left a comment
There was a problem hiding this comment.
Blocking here, I just need to check the implementation IIRC there is a reason MSD algo itself is non-parallelisable, but may not apply if only the collection of particle positions is parallelised.
Like the tests were passing so I thought it has been parallized |
marinegor
left a comment
There was a problem hiding this comment.
hi @tanishy7777, sorry for long review -- many life things got in the way.
I think you're on a good path, I mentioned few minor things in the comments.
Main action items:
- move out hacky
@staticmethod def f(arrays): passout of the class - using datafiles in MDAnalysisTests (imported on top of
test_msd.py, check that parallelized run produces exactly the sameresultsas non-parallelized one (add code snippet to comments that anyone can run to check, and its results) - add this check as a test (if it's too slow, we can always mark it this way and not run by default)
Please ask if you have questions, and ping me here if I don't reply for more than 48 hours.
package/MDAnalysis/analysis/msd.py
Outdated
| @staticmethod | ||
| def f(arrs): | ||
| pass |
There was a problem hiding this comment.
move this outside of the class and explicitly name it something like __noop to make sure no one uses that:
def __noop(arrs) -> None: passThere was a problem hiding this comment.
move this outside of the class and explicitly name it something like
__noopto make sure no one uses that:def __noop(arrs) -> None: pass
I tried renaming it to __noop and moving it outside the class, but this leads to NameError due to how name mangling works in python. I renamed it _noop instead, hope that isnt a problem?
we could also do something like this if we want to keep the name as __noop
from MDAnalysis.analysis import msd
def _get_aggregator(self):
return ResultsGroup(
lookup={
"_position_array": ResultsGroup.ndarray_vstack,
"msds_by_particle": msd.__noop,
"timeseries": msd.__noop,
}
)Not sure if this is as clean as above method though.
There was a problem hiding this comment.
Since there are indeed concerns that the algorithm is indeed parallelizable, I would suggest you explicitly test for that: namely, ensure that all results.<something> attributes are exactly the same, regardless of how you run your analysis. You can do this yourself first, and than later add it as one of the tests here.
I should also say that if you make a convenient way to test that, it'd be nice to have it added to other parallelization tests, to ensure additional correctness.
Hey, sorry for the late response. I had semester exams so I was quite busy the last 2 weeks. Will start working on this soon! |
|
As suggested in #4896 (comment) After running the tests, the results seem to be equal across both backends. Could you please review the implementation and confirm if this is correct? Also, is this a good way of adding this test? Should I add it for all other methods that were parallelized? I have added the following test for comparing the Thank you! |
@marinegor just a gentle reminder |
|
@marinegor tests pass, codecov is good, linter is happy – could you please have another look? |
|
hey @tanishy7777 sorry for the long response, I had serious personal issues I had to attend to. I'm planning to get back to reviewing PR -- could you please resolve the merge conflicts so I could do that? |

Fixes #4676
Changes made in this Pull Request:
split-apply-combinetechnique to parallelize theMDAnalysis.analysis.msd.EinsteinMSDtestsuite/analysis/conftest.py, analogous with existing onesclient_EinsteinMSD, fixtures to all tests using intestsuite/MDAnalysisTests/analysis/test_msd.py, and modified the wayrun()method is called torun(**client_EinsteinMSD)PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4896.org.readthedocs.build/en/4896/