Scalable GNN Explanations with Distributed Shapley Values

We develop DistShap, a parallel algorithm that distributes Shapley value-based explanations across multiple GPUs. DistShap samples subgraphs in a distributed setting, executes GNN inference in parallel across GPUs, and solves a distributed least squares problem to compute edge importance scores. DistShap outperforms most existing GNN explanation methods in accuracy and is the first to scale to GNN models with millions of edges by using 128 GPUs.

The DistShap code is based on the source code of GNNShap.

GCN Results

GAT Results

Setup

This implementation is based on PyTorch and PyG. It requires a GPU with Cuda support.

The required packages and versions are provided in the requirements.txt file.

We used Cuda 12.4 in our experiments. Please make sure Cuda is already installed.

Please first run the following command in the directory to install the required packages and compile the Cuda extension:

pip install .

Dataset Configs

Dataset and dataset-specific model configurations are in the dataset/configs.py file.

Model training

We included pre-trained models in the pretrained folder. However, we provided the following scripts to retrain models if needed.

To train Coauthor-CS, Coauthor-Physics, DBLP, and ogbn-arxiv:

python train.py --dataset ogbn-arxiv

Reddit and ogbn-products require NeighborLoader for training. To train them:

python train_large.py --dataset Reddit

Experiments

We used two script files to run experiments on Slurm: sjob.sh and torchrun.sh. sjob.sh contains details related to Slurm allocation. Please update the script accordingly if needed, and do not forget to change the account_name with a valid account. In addition, the number of nodes can be set accordingly. Note that the script assumes that each node has four GPUs.

Submit the Slurm job by running:

sbatch ./sjob.sh

The torchrun.sh script is called when resources are allocated. The dataset name and number of samples are specified in the torchrun.sh.

The results will be saved to the results folder.

Evaluation

We used the DistShapEvaluation.ipynb notebook to evaluate runtimes and Fidelity scores.

Limitations. At present, DistShap constructs computational graphs on a CPU and distributes them across GPUs to compute Shapley values. Thus, DistShap cannot process graphs that exceed the available CPU memory (258 GB on the system used in our experiments).

Sampling time and scalability. Although the sampling step generates 30 million subgraphs for 50 nodes, it remains extremely fast due to our strategy of replicating the computation graph across GPUs. Figure 9 demonstrates that sampling is highly efficient and scales effectively up to 64 GPUs. A minor slowdown is observed at 128 GPUs, which can be attributed to CUDA kernel overhead.

Figure 9: Scalability of sampling on the Reddit dataset. Total sampling time for explaining 50 nodes.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
baselines		baselines
cppextension		cppextension
dataset		dataset
gnnshap		gnnshap
images		images
models		models
pretrained		pretrained
.gitignore		.gitignore
DistShapEvaluation.ipynb		DistShapEvaluation.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_distgnnshap.py		run_distgnnshap.py
run_gnnshap.py		run_gnnshap.py
setup.py		setup.py
sjob.sh		sjob.sh
torchrun.sh		torchrun.sh
train.py		train.py
train_large.py		train_large.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scalable GNN Explanations with Distributed Shapley Values

GCN Results

GAT Results

Setup

Dataset Configs

Model training

Experiments

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scalable GNN Explanations with Distributed Shapley Values

GCN Results

GAT Results

Setup

Dataset Configs

Model training

Experiments

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages