-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
a problem about dataset #2
Comments
Hi, Thanks for reaching out! As mentioned in the methods, the molecules were assigned to the training set (80%), validation set (10%) or test set (10%) using a scaffold split. Specifically, the code in this README provides the exact parameters used to split the data and define which ones are held out. Since not all molecules could be processed through RDKit and Chemprop, 1143, 1055, and 963 refers to those that could be processed and were used for developing the model. |
I believe the question remains unresolved. The provided file in the paper, probe_screen-data.xlsx, does not appear to align with the documentation in this repository. The file format is .xlsx rather than .csv, as mentioned in the instructions. Thank you for your assistance! |
Hi, I added a folder with the CSVs referenced in the notebook. Hopefully this helps with running the code. |
Hello,
I noticed that your article mentions the collection of datasets with 1143, 1055, and 963 molecules in MED1, NPM1, and HP1α droplets, respectively, and that you provided the data file probe_screen-data.xlsx. However, it seems that the article does not specify how these datasets are divided, nor does it mention which data are held out. I have reviewed the data file, but the molecule counts for each droplet do not seem to match those mentioned.
Therefore, I would like to understand your method for dividing the datasets and how to derive the training, validation, and test datasets for the deep learning models described in the readme.md from this data file. Your assistance in clarifying these points would be greatly appreciated.
Thank you!
The text was updated successfully, but these errors were encountered: