CoMPT

Source code for our IJCAI 2021 paper Learning Attributed Graph Representation with Communicative Message Passing Transformer

The code was built based on Molecule Attention Transformer and The Annotated Transformer. Thanks a lot for their code sharing!

Dependencies

cuda >= 9.0
cudnn >= 7.0
RDKit == 2020.03.4
torch >= 1.4.0 (please upgrade your torch version in order to reduce the training time)
numpy == 1.19.1
scikit-learn == 0.23.2
tqdm == 4.52.0

Tips: Using code conda install -c conda-forge rdkit can help you install package RDKit quickly.

Dataset

Dataset	Tasks	Type	Molecule	Metric
bbbp	1	Graph Classification	2,035	ROC-AUC
tox21	12	Graph Classification	7,821	ROC-AUC
sider	27	Graph Classification	1,379	ROC-AUC
clintox	2	Graph Classification	1,468	ROC-AUC
esol	1	Graph Regression	1,128	RMSE
freesolv	1	Graph Regression	642	RMSE
lipophilicity	1	Graph Regression	4,198	RMSE
1H-NMR	1	Node Regression	12,800	MAE
13C-NMR	1	Node Regression	26,859	MAE

Preprocess

For the Graph-level task (Graph classification, Graph Regression), you can download the source dataset from Molecule-Net.

For the Node-level task (Node Regression), you can download the source dataset from NMRShiftDB2, or use a preprocess dataset cleaned by nmr-mpnn, thanks a lot for their code sharing!

In the folder ./Data, we have preprocessed every mentioned dataset by the corresponding jupyter notebook. All source datasets can be refered in the ./Data/<dataset>/source/, and all preprocess files can be refered in the ./Data/<dataset>/preprocess/.

You can also run the corresponding jupyter notebook in the path ./Data/<dataset>/preprocessing.ipynb to generate the <dataset>.pickle files.

Training

To train a graph-level task, run:

python train_graph.py --seed <seed> --gpu <gpu> --fold 5 --dataset <dataset> --split <split>

where <seed> is the seed number, <gpu> is the gpu index number, <dataset> is the graph-level dataset name (bbbp, tox21, sider, clintox, esol, freesolv, lipophilicity), <split> is the split method that mentioned by Molecule-Net (random, scaffold, cv).

To train a node-level task, run:

python train_node.py --seed <seed> --gpu <gpu> --dataset nmrshiftdb --element <element>

where <seed> is the seed number, <gpu> is the gpu index number, <element> is the element name(1H for 1H-NMR, 13C for 13C-NMR).

All hyperparameters can be tuned in the utils.py

Todo

Clean the unuse function and write more comments.
Replace the unnoticed Chinese comments in English.
Generate the split-fold files in .csv format, rewrite the code and then make a bash script to train all folds in parallel.
Make a suitable padding way to adapt the molecules with more than 100 atoms, which will be used in the protein (long period).
Try our best to reduce the training time and the using memory, especially for the large dataset (long period).

Citation

Please cite the following paper if you use this code in your work.

@misc{chen2021learning,
      title={Learning Attributed Graph Representations with Communicative Message Passing Transformer}, 
      author={Jianwen Chen and Shuangjia Zheng and Ying Song and Jiahua Rao and Yuedong Yang},
      year={2021},
      eprint={2107.08773},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Data		Data
Model		Model
Result		Result
LICENSE		LICENSE
README.md		README.md
dataset_graph.py		dataset_graph.py
dataset_node.py		dataset_node.py
train_graph.py		train_graph.py
train_node.py		train_node.py
transformer_graph.py		transformer_graph.py
transformer_node.py		transformer_node.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoMPT

Dependencies

Dataset

Preprocess

Training

Todo

Citation

About

Releases

Packages

Languages

License

f-kretschmer/CoMPT

Folders and files

Latest commit

History

Repository files navigation

CoMPT

Dependencies

Dataset

Preprocess

Training

Todo

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages