Add NVLink P2P support for mixed NVLink/PCIe GPU topologies#18
Open
valdemardi wants to merge 1 commit intoaikitoria:595.45.04-p2pfrom
Open
Add NVLink P2P support for mixed NVLink/PCIe GPU topologies#18valdemardi wants to merge 1 commit intoaikitoria:595.45.04-p2pfrom
valdemardi wants to merge 1 commit intoaikitoria:595.45.04-p2pfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi @aikitoria
I created an NVLink-enabled version based on your 595.45.04 updated tinygrad driver. In my repository, I forked the Nvidia upstream repository from the 595.45.04 tag, applied most of the changes from your repository (excluding the README and install.sh), and then made the NVLink enabling changes and updated the README with some test results, which confirm that the driver works as expected.
Today I also created a commit against your repository with the changes, in case you or others might find this useful, given your repository's visibility. The version in this PR should work as a drop-in replacement for your version. If the system running this version has NVLink(s), the driver will prefer them where possible, and otherwise it will fall back to the BAR1 PCIe P2P approach.
I have tested the this PR version only on a quad RTX 3090 system with two NVLinks (two NVLinked GPU pairs) and with that system it works as expected. I'd expect it to work the same as your version on systems with no NVLinks, but I have not done any testing.
Cheers