Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Confirmation] Optimal Hyperparameters and Reproducibility #21

Closed
chao1224 opened this issue Nov 23, 2022 · 19 comments
Closed

[Confirmation] Optimal Hyperparameters and Reproducibility #21

chao1224 opened this issue Nov 23, 2022 · 19 comments
Labels
question Further information is requested

Comments

@chao1224
Copy link

chao1224 commented Nov 23, 2022

Hi there,

Thanks for providing the nice codebase. I'm trying to reproduce the results for downstream tasks, and I have the following questions.

  • I'm wondering if the scripts under this folder are only samples? For the optimal hyperparameters for OntoProtein, we should follow Table 6 in the paper?
  • For ProtBert, are you using the same optimal hyperparameters for each downstream task?
  • Table 6 doesn't cover the optimal values for gradient_accumulation_steps and eval_step. Can you help clarify this?

Any help is appreciated.

@cheng-siyuan
Copy link
Contributor

Hi, thanks for your question:

When running downstream tasks, please keep the parameters consistent with Table 6 in the paper. I need to explain that the batch_size in our paper Table 6 is equal to per_device_batch_size*gradient_accumulation_steps. This is because some GPU memory is not enough, which need to adjust per_device_batch_size and gradient_accumulation_steps to achieve the same batchsize. As for 'eval_step', I don’t think it will affect the training process, just follow the default in scripts. Protbert use the same hyperparameters as our model.

@chao1224
Copy link
Author

Hi, thank you for the prompt reply. It's very helpful!

Meanwhile, I carefully check the per_device_batch_size and gradient_accumulation_steps (listed below), and it seems that contact and ss3 have some minor mismatch issues when using the default hyperparameters under the script folder. Can you help check this once available?

per_device_batch_size gradient_accumulation_steps batch-size in Table 6
contact 1 1 8
fluorescence 4 16 64
homology 1 64 64
ss3 2 8 32
ss8 2 16 32
stability 2 16 32

@cheng-siyuan
Copy link
Contributor

您好,感谢您的及时回复。这非常有帮助!

同时,我仔细检查了per_device_batch_sizeand gradient_accumulation_steps(如下所列),似乎contactss3在使用脚本文件夹下的默认超参数时存在一些小的不匹配问题。你能帮忙检查一下吗?

per_device_batch_size gradient_accumulation_steps 表6中的batch-size
接触 1 1 8
荧光 4 16 64
同源性 1 64 64
SS3 2 8 32
ss8 2 16 32
稳定 2 16 32

Hello:
Thank you for your correction. First of all, I apologize to you for my carelessness yesterday. I forgot to mention that the batchsize in Table 6 is also related to the number of GPUs we use. I have corrected and updated our script.

@zxlzr zxlzr added the question Further information is requested label Nov 25, 2022
@zxlzr zxlzr closed this as completed Nov 27, 2022
@chao1224
Copy link
Author

Thanks. I can roughly reproduce the results using the latest scripts (except contact, which is still running).

@chao1224
Copy link
Author

chao1224 commented Dec 7, 2022

Hi @zxlzr , I got the following result for OntoProtein on contact, which seems to be too. The paper reports 0.40 for l2. I'm not sure if I miss sth, can you help double-check it?

'accuracy_l5': 0.6621209383010864, 'accuracy_l2': 0.5874732732772827, 'accuracy_l': 0.4826046824455261

@QQQrookie
Copy link

Hi @zxlzr , I got the following result for OntoProtein on contact, which seems to be too. The paper reports 0.40 for l2. I'm not sure if I miss sth, can you help double-check it?

'accuracy_l5': 0.6621209383010864, 'accuracy_l2': 0.5874732732772827, 'accuracy_l': 0.4826046824455261

Hi, can you please provide more detailed information? For example, your hyperparameters and the sequence length between amino acids.

@chao1224
Copy link
Author

chao1224 commented Dec 7, 2022

Hi there,
I just followed the hyperparameters in this link.
Can you help explain what is the sequence length between amino acids?

@cheng-siyuan
Copy link
Contributor

sequence length between amino acids is the short-, medium- or long-range setting selected for the contact test. Specific details can be found in the TAPE.

@chao1224
Copy link
Author

chao1224 commented Dec 7, 2022

Thanks. I'm reporting the one for medium&long range prediction, just as in the paper and #8.

@chao1224
Copy link
Author

Hi, may I ask if there are any follow-ups?

@cheng-siyuan
Copy link
Contributor

I don't know precisely what the error cause is, but we can provide a model of the contact task fine-tuned according to our hyperparameters. And this model was also retrained by us, so there may be 2~4 points of fluctuation, but the difference will not be much.

@chao1224
Copy link
Author

Sounds good! That would be very helpful!

@cheng-siyuan
Copy link
Contributor

You can download the checkpoint here

@chao1224
Copy link
Author

@cheng-siyuan
Thanks. So using the model you provided, I got the following output:

'accuracy_l5': 0.6149854063987732, 'accuracy_l2': 0.5102487802505493, 'accuracy_l': 0.4060076177120209

Not sure if I miss sth.

@cheng-siyuan
Copy link
Contributor

Yes, this checkpoint is obtained later when we have updated the hyperparameters, so that the effect will be 3 to 4 points higher than our result at the beginning. The result in the paper is not the best result yet, and I apologize for the trouble to your experiment.

@chao1224
Copy link
Author

No problem at all, and you have already been very helpful in replying to the messages :)

So just to double-check, you mean that in Table 1 of your paper, it should be updated to 0.51 (with your checkpoint) for contact with OntoProtein, right?

@cheng-siyuan
Copy link
Contributor

Yes, we recommend that you use the results of this checkpoint as a reference.

@chao1224
Copy link
Author

Sounds good! Appreciate your help and being responsible for your work!

@cheng-siyuan
Copy link
Contributor

You're welcome:)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants