[GNN] Reference implementation for GNN node classification #700

LiSu · 2024-02-05T08:23:56Z

In this PR we (Alibaba, Intel & Nvidia) propose a GNN training benchmark, which is a multi-class node classification task in a heterogenous graph using the IGB Heterogeneous Dataset named IGBH-Full. The task is carried out using a GAT model based on the Relational Graph Attention Networks paper.

github-actions · 2024-02-05T08:24:08Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

LiSu · 2024-02-05T08:29:23Z

recheck

Elnifio · 2024-02-05T19:10:57Z

gnn_node_classification/README.md

+pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
+pip install torch_geometric
+pip install --no-index  torch_scatter torch_sparse -f https://data.pyg.org/whl/torch-1.13.0+cu117.html
+pip install graphlearn-torch


Could we have a library version pinned here so that we can reproduce the exact results even after a few months / years? This would also apply for all other libraries, such as torch_geometric in line 14.

Additionally, it would be very helpful for users if we can integrate these into a Dockerfile, so that all they need to do is to run docker build -f Dockerfile . instead of manually following all the steps.

Added guidelines for building Docker image using the Dockerfile

…kerfile, refine the evaluation frequency and thoroughness

Elnifio · 2024-02-20T23:16:51Z

gnn_node_classification/split_seeds.py

+  parser.add_argument("--random_seed", type=int, default='42')
+  parser.add_argument('--num_classes', type=int, default=2983,
+      choices=[19, 2983], help='number of classes')
+  parser.add_argument("--validation_frac", type=float, default=0.025,


Minor correction: the validation fraction should be set as 0.005 - $1 \over 40$ of the original validation dataset size. Since the original validation dataset size is ${1 \over 5} = 0.2$ of the full trainable nodes, we should use ${1 \over 5} \times {1 \over 40} = {1 \over 200} = 0.005$ of the full trainable nodes, instead of ${1 \over 40} = 0.025$ of the full trainable nodes, so that the eval cost would be at most 10% for the most basic single-node configuration, similar to other benchmarks.

Thanks for the correction!

Elnifio · 2024-02-21T18:46:55Z

Is it possible to also update the file name from gnn_node_classification to graph_neural_network so that the folder name refers to the domain name (GNN) but not the overly detailed task name (node classification), which is similar to all other benchmarks?

LiSu · 2024-02-22T03:27:26Z

Is it possible to also update the file name from gnn_node_classification to graph_neural_network so that the folder name refers to the domain name (GNN) but not the overly detailed task name (node classification), which is similar to all other benchmarks?

The folder is renamed as graph_neural_network

graph_neural_network/train_rgnn_multi_gpu.py

Elnifio · 2024-03-14T00:54:38Z

As discussed in the MLLogging PR, could we also add gradient accumulation step (1 in our current case) and optimizer name (Adam in our case) to MLLog outputs?

… log outputs Committed-by: LiSu from Dev container

LiSu · 2024-03-14T02:59:02Z

As discussed in the MLLogging PR, could we also add gradient accumulation step (1 in our current case) and optimizer name (Adam in our case) to MLLog outputs?

Added gradient accumulation step and optimizer name to MLLog outputs ;-)

Elnifio · 2024-03-20T22:20:26Z

As discussed in the MLLogging PR, could we also add gradient accumulation step (1 in our current case) and optimizer name (Adam in our case) to MLLog outputs?

Added gradient accumulation step and optimizer name to MLLog outputs ;-)

Just noticed that the checker is asking for "adam" instead of "Adam". Could we have this small fix checked in so that the reference is consistent with the compliance checker?

Committed-by: LiSu from Dev container

LiSu · 2024-03-21T02:17:09Z

As discussed in the MLLogging PR, could we also add gradient accumulation step (1 in our current case) and optimizer name (Adam in our case) to MLLog outputs?

Added gradient accumulation step and optimizer name to MLLog outputs ;-)

Just noticed that the checker is asking for "adam" instead of "Adam". Could we have this small fix checked in so that the reference is consistent with the compliance checker?

Fixed in the last commit.

Reference implementation for GNN node classification

7869f1f

LiSu requested a review from a team as a code owner February 5, 2024 08:23

LiSu marked this pull request as draft February 5, 2024 08:30

Elnifio reviewed Feb 5, 2024

View reviewed changes

LiSu added 3 commits February 19, 2024 03:05

Added guidelines for building the Docker image using the provided Doc…

e2a5320

…kerfile, refine the evaluation frequency and thoroughness

minors

a1a0416

Add code of reference implementation

d7d0e53

LiSu changed the title ~~[WIP] Reference implementation for GNN node classification~~ Reference implementation for GNN node classification Feb 20, 2024

LiSu marked this pull request as ready for review February 20, 2024 06:37

Elnifio reviewed Feb 20, 2024

View reviewed changes

Set the default validation frac to 0.005

efdce3d

LiSu changed the title ~~Reference implementation for GNN node classification~~ [GNN] Reference implementation for GNN node classification Feb 21, 2024

Renamee the folder and add contributors in readme

3398b08

LiSu added 2 commits February 22, 2024 03:30

minor

7724b34

minors

3afa89d

Elnifio reviewed Mar 11, 2024

View reviewed changes

graph_neural_network/train_rgnn_multi_gpu.py Outdated Show resolved Hide resolved

Round up epoch_num, add GRADIENT_ACCUMULATION_STEPS and OPT_NAME into…

8ed4fc8

… log outputs Committed-by: LiSu from Dev container

minor

b98bb57

Committed-by: LiSu from Dev container

nv-rborkar approved these changes Mar 21, 2024

View reviewed changes

nv-rborkar merged commit 2d0e7ae into mlcommons:master Mar 21, 2024
1 check passed

github-actions bot locked and limited conversation to collaborators Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GNN] Reference implementation for GNN node classification #700

[GNN] Reference implementation for GNN node classification #700

LiSu commented Feb 5, 2024 •

edited

Loading

github-actions bot commented Feb 5, 2024 •

edited

Loading

LiSu commented Feb 5, 2024

Elnifio Feb 5, 2024

LiSu Feb 19, 2024

Elnifio Feb 20, 2024 •

edited

Loading

LiSu Feb 21, 2024

Elnifio commented Feb 21, 2024 •

edited

Loading

LiSu commented Feb 22, 2024

Elnifio commented Mar 14, 2024

LiSu commented Mar 14, 2024

Elnifio commented Mar 20, 2024

LiSu commented Mar 21, 2024

[GNN] Reference implementation for GNN node classification #700

[GNN] Reference implementation for GNN node classification #700

Conversation

LiSu commented Feb 5, 2024 • edited Loading

github-actions bot commented Feb 5, 2024 • edited Loading

LiSu commented Feb 5, 2024

Elnifio Feb 5, 2024

Choose a reason for hiding this comment

LiSu Feb 19, 2024

Choose a reason for hiding this comment

Elnifio Feb 20, 2024 • edited Loading

Choose a reason for hiding this comment

LiSu Feb 21, 2024

Choose a reason for hiding this comment

Elnifio commented Feb 21, 2024 • edited Loading

LiSu commented Feb 22, 2024

Elnifio commented Mar 14, 2024

LiSu commented Mar 14, 2024

Elnifio commented Mar 20, 2024

LiSu commented Mar 21, 2024

LiSu commented Feb 5, 2024 •

edited

Loading

github-actions bot commented Feb 5, 2024 •

edited

Loading

Elnifio Feb 20, 2024 •

edited

Loading

Elnifio commented Feb 21, 2024 •

edited

Loading