Enhancing Model Learning and Interpretation Using Multiple Molecular Graph Representations for Compound Property and Activity Prediction
This code was tested in Python 3.8 with PyTorch 1.13 and rdkit 2023.3.2
- Using Conda:
conda create -f mmgx.yaml
- Then, activate the environment
conda activate mmgx
- Prepare dataset in
dataset/
folder. Dataset should be in.csv
format withsmiles
,label
, andsplitting
columns. - Indicate the column name in
dataset/_dataset.csv
file.
- [dataset] = name of dataset without
.csv
extension - [model] = {GAT, GIN, GAT_edge, Benchmark_GCN, Benchmark_GIN, Benchmark_AttentiveFP}
- [schema] = {A (for atom graph only), AR_0 (for combination with pooling), R (for reduced graph)}
- [reduced] = {functional, junctiontree, pharmacophore}
python3 hyperparameter.py \
-f [dataset] \
-m [model] \
--schema [schema] \
--reduced [reduced_(optional)] \
--mol_embedding 256 \
--batch_normalize \
--fold 5 \
--seed 42
- Examples
# Example, for Atom graph only model
python3 hyperparameter.py \
-f bbbp \
-m GIN \
--schema A \
--reduced \
--mol_embedding 256 \
--batch_normalize \
--fold 5 \
--seed 42
# Example, for Functional graph only model
python3 hyperparameter.py \
-f bbbp \
-m GIN \
--schema R \
--reduced functional \
--mol_embedding 256 \
--batch_normalize \
--fold 5 \
--seed 42
# Example, for 2-graph only model (Atom+Functional)
python3 hyperparameter.py \
-f bbbp \
-m GIN \
--schema AR_0 \
--reduced functional \
--mol_embedding 256 \
--batch_normalize \
--fold 5 \
--seed 42
# Example, for 3-graph model (Atom+Functional+Pharmacophore)
python3 hyperparameter.py \
-f bbbp \
-m GIN \
--schema AR_0 \
--reduced functional pharmacophore \
--mol_embedding 256 \
--batch_normalize \
--fold 5 \
--seed 42
(All can be retrieved from hyperparameter tuning)
- [dataset] = name of dataset without
.csv
extension - [model] = {GAT, GIN, GAT_edge, Benchmark_GCN, Benchmark_GIN, Benchmark_AttentiveFP}
- [schema] = {A (for atom graph only), AR_0 (for combination with pooling), R (for reduced graph)}
- [reduced] = {functional, junctiontree, pharmacophore}
- [batch_size] = {batch size}
- [number_of_layer] = {number of node embedding layers for Atom graph}
- [number_of_layer_reduced] = {number of node embedding layers for reduced graph}
- [in_channels] = {number of input features}
- [hidden_channels] = {number of hidden features}
- [out_channels] = {number of output features}
- [number_of_layer_self] = {number of molecule embedding layers for Atom graph}
- [number_of_layer_self_reduced] = {number of molecule embedding layers for reduced graph}
python3 main.py \
-f [dataset] \
-m [model] \
--schema [schema] \
--reduced [reduced graph (optional)] \
--mol_embedding 256 \
--batch_normalize \
--fold 5 \
--seed 42 \
--batch_size [batch_size] \
--num_layers [number_of_layer] \
--num_layers_reduced [number_of_layer_reduced] \
--in_channels [in_channels] \
--hidden_channels [hidden_channels] \
--out_channels [out_channels] \
--num_layers_self [number_of_layer_self] \
--num_layers_self_reduced [number_of_layer_self_reduced] \
- Kengkanna A, Ohue M. Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX. Communications Chemistry, 7: 74, 2024. doi: 10.1038/s42004-024-01155-w