Prepare the raw data as
|-- data
|-- properties
|-- <property>
|-- cif
|-- <raw_data>.csv
The csv file should at least contain the following 3 columns
material_id, cif, <prop>
<prop>
can be arbitrary property types, like Tc in superconductors.
Split the raw data via the following script
python scripts/make_split.py --dir data/properties/<property> --csv <raw_data>.csv
The default setting will shuffle the dataset in random seed 42 and split it into train.csv, val.csv and test.csv with ratio 8:1:1.
python diffcsp/run.py model=prediction data=property data.subdir=<property> data.prop=<prop> data.task=<task> data.opt_target=<opt_target> exptag=<property>_<prop> expname=prediction
The trained model is saved in singlerun/<property>_<prop>/prediction
. The default 3D encoder is DimeNet++, and one can change it into more powerful encoders (e.g. Equiformer).
<task>
can be chosen from classification/regression.
<opt_target>
have different meanings for different tasks:
For classification, <opt_target>
means the required class to generation.
For regression, <opt_target> = 1
means to generate candidates with higher property (like Tc), while <opt_target> = -1
means to generate candidates with lower property (like formation energy)
python diffcsp/run.py model=guidance data=property data.subdir=<property> data.prop=<prop> data.task=<task> data.opt_target=<opt_target> exptag=<property>_<prop> expname=guidance
The trained model is saved in singlerun/<property>_<prop>/guidance
.
python scripts/optimization.py --model_path ${PWD}/singlerun/<property>_<prop>/guidance --uncond_path ${PWD}/singlerun/2023-04-18/pure_pretrain
The above command will yield eval_opt.pt
under the singlerun/<property>_<prop>/guidance
directory, which contains 500 optimized structures.
python scripts/eval_optimization.py --dir ${PWD}/singlerun/<property>_<prop>
The results are logged in singlerun/<property>_<prop>/results
as
|-- results
|-- summary.log
|-- results.csv
|-- cif
|-- xx.cif
...
summary.log
summaries the results of the property prediction & guidance model. An example is provided as
*************** Property Prediction ***************
Test pcc: 0.4857
*************** Optimization ***************
Top-5 Results:
489-xx: xx
385-xx: xx
249-xx: xx
486-xx: xx
163-xx: xx
export CUDA_VISIBLE_DEVICES=1
python scripts/make_split.py --dir data/properties/SuperCon --csv order_data_tc.csv
python diffcsp/run.py model=prediction data=property data.subdir=SuperCon data.prop=logtc data.task=regression data.opt_target=1 exptag=SuperCon_logtc expname=prediction
python diffcsp/run.py model=guidance data=property data.subdir=SuperCon data.prop=logtc data.task=regression data.opt_target=1 exptag=SuperCon_logtc expname=guidance
python scripts/optimization.py --model_path ${PWD}/singlerun/SuperCon_logtc/guidance --uncond_path ${PWD}/singlerun/2023-04-18/pure_pretrain
python scripts/eval_optimization.py --dir ${PWD}/singlerun/SuperCon_logtc
Please consider citing our work if you find it helpful:
@inproceedings{chenlearning,
title={Learning Superconductivity from Ordered and Disordered Material Structures},
author={Chen, Pin and Peng, Luoxuan and Jiao, Rui and Mo, Qing and Zhen, WANG and Huang, Wenbing and Liu, Yang and Lu, Yutong},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024}
}