Uni-Mol: Problem reproducing molecular property prediction results #323

rasmusthrane · 2025-03-04T12:17:59Z

I am trying to reproduce the results for molecular property prediction on the FreeSolv dataset. However, I do not succeed in this task and obtain the following metric RMSE=1.655 compared to RMSE=1.480(0.048) in the paper.

I am using Docker on a signle GPU, and I downloaded the all hydrogen model parameters and used the supplied training script:

data_path="/workspace/Uni-Mol/unimol/data/molecular_property_prediction"  # replace to your data path
save_dir="/workspace/Uni-Mol/unimol/weights/finetuned/freesolv"  # replace to your save path
dict_name="dict.txt"
weight_path="/workspace/Uni-Mol/unimol/weights/pretrained/mol_pre_all_h_220816.pt"  # replace to your ckpt path
task_name="freesolv"  # molecular property prediction task name 
task_num=1
loss_func=finetune_mse
lr=8e-5
batch_size=64
epoch=60
dropout=0.2
warmup=0.1
local_batch_size=64
only_polar=-1
conf_size=11
seed=0

if [ "$task_name" == "qm7dft" ] || [ "$task_name" == "qm8dft" ] || [ "$task_name" == "qm9dft" ]; then
	metric="valid_agg_mae"
elif [ "$task_name" == "esol" ] || [ "$task_name" == "freesolv" ] || [ "$task_name" == "lipo" ]; then
    metric="valid_agg_rmse"
else 
    metric="valid_agg_auc"
fi

export NCCL_ASYNC_ERROR_HANDLING=1
export OMP_NUM_THREADS=1
export CUDA_VISIBLE_DEVICES=1 # which device to use
update_freq=`expr $batch_size / $local_batch_size`
python $(which unicore-train) $data_path --task-name $task_name --user-dir /workspace/Uni-Mol/unimol --train-subset train --valid-subset valid,test \
       --conf-size $conf_size \
       --num-workers 8 --ddp-backend=c10d \
       --dict-name $dict_name \
       --task mol_finetune --loss $loss_func --arch unimol_base  \
       --classification-head-name $task_name --num-classes $task_num \
       --optimizer adam --adam-betas "(0.9, 0.99)" --adam-eps 1e-6 --clip-norm 1.0 \
       --lr-scheduler polynomial_decay --lr $lr --warmup-ratio $warmup --max-epoch $epoch --batch-size $local_batch_size --pooler-dropout $dropout\
       --update-freq $update_freq --seed $seed \
       --fp16 --fp16-init-scale 4 --fp16-scale-window 256 \
       --log-interval 100 --log-format simple \
       --validate-interval 1 \
       --finetune-from-model $weight_path \
       --best-checkpoint-metric $metric --patience 20 \
       --save-dir $save_dir --only-polar $only_polar

My installed packages: https://pastebin.com/hC4G8VFs

The text was updated successfully, but these errors were encountered:

ZhouGengmo · 2025-03-05T08:02:28Z

We conducted experiments under three different seeds and averaged the test results corresponding to the best validation performance.

Additionally, we attempted a hyperparameter search, and the search spaces are listed in Table 7 in the appendix of the paper.

rasmusthrane · 2025-03-05T09:47:45Z

Thank you for a quick answer. The test results I obtain are quite different from the ones you obtain even though I only run it once (if you compare with the reported standard deviation)

Is it possible to get the seeds you used to ensure that my setup works? Also, do I have to switch to commit 37b0198 or earlier to reproduce the results or is this only for molecular conformation generation?

ZhouGengmo · 2025-03-06T08:08:31Z

Since FreeSolv is very small, it is sensitive to changes in seeds, environment, etc.
The seed should be 2/3/4, if I remember correctly.
If you still can't reproduce the results, perform a hyperparameter search or try other seeds.

There is no need to switch to the previous commit, only conformer generation requires that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uni-Mol: Problem reproducing molecular property prediction results #323

Uni-Mol: Problem reproducing molecular property prediction results #323

rasmusthrane commented Mar 4, 2025 •

edited

Loading

ZhouGengmo commented Mar 5, 2025

rasmusthrane commented Mar 5, 2025

ZhouGengmo commented Mar 6, 2025

Uni-Mol: Problem reproducing molecular property prediction results #323

Uni-Mol: Problem reproducing molecular property prediction results #323

Comments

rasmusthrane commented Mar 4, 2025 • edited Loading

ZhouGengmo commented Mar 5, 2025

rasmusthrane commented Mar 5, 2025

ZhouGengmo commented Mar 6, 2025

rasmusthrane commented Mar 4, 2025 •

edited

Loading