-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Update GTDB download and formatting #366
Conversation
moves code to entirely python-based download and setup
moves code to entirely python-based download and setup
Removes documentation about autometa-setup-gtdb, which can be revisited later. That pathway makes it hard to ensure reproducibility since the files, as downloaded, don't have version info. For now, for the sake of ensuring reproducibility, the only thing accepted are Autometa-downloaded files.
When running: autometa-config \
--section databases --option gtdb \
--value <path/to/your/gtdb/database/directory> Currently it does not create the directory if it doesn't already exist, leading to an error when the |
Also, would it not be faster to download from |
The day I tried they were both downloading the exact same rate so I left it. I think the world mirror is still in Australia. |
The URL structure is different, give me a minute |
fixed |
You still have a mistake in the URL. Got the error: [12/06/2024 10:56:21 AM ERROR] autometa.taxonomy.download_gtdb_files: Failed to fetch MD5SUM.txt: 404 Client Error: Not Found for url: https://data.ace.uq.edu.au/public/gtdb/data/public/gtdb/data/releases/release207/207.0/MD5SUM.txt The mistake is that the second "public" directory doesn't exist. It should be |
Something is wrong with the creation of the autometa-formatted version of faa.gz. I get this error: [12/06/2024 01:21:58 PM DEBUG] autometa.common.external.diamond: diamond makedb --in /home/jason/Downloads/databases/autometa_formatted_gtdb-version-207.0.faa.gz --db /home/jason/Downloads/databases/autometa_formatted_gtdb-version-207.0.dmnd -p 120
Traceback (most recent call last):
File "/home/jason/mambaforge/envs/autometa/bin/autometa-update-databases", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/jason/mambaforge/envs/autometa/lib/python3.12/site-packages/autometa/config/databases.py", line 860, in main
diamond.makedatabase(
File "/home/jason/mambaforge/envs/autometa/lib/python3.12/site-packages/autometa/common/external/diamond.py", line 51, in makedatabase
subprocess.run(
File "/home/jason/mambaforge/envs/autometa/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['diamond', 'makedb', '--in', '/home/jason/Downloads/databases/autometa_formatted_gtdb-version-207.0.faa.gz', '--db', '/home/jason/Downloads/databases/autometa_formatted_gtdb-version-207.0.dmnd', '-p', '120']' returned non-zero exit status 1. When trying to run the diamond command in the terminal, I get: diamond v2.1.10.164 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
#CPU threads: 120
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database input file: /home/jason/Downloads/databases/autometa_formatted_gtdb-version-207.0.faa.gz
Opening the database file... Error: Error detecting input file format. First line seems to be blank. And indeed with zcat, the file autometa_formatted_gtdb-version-207.0.faa.gz is blank. |
It worked from scratch for me. My guess is a partial download from a previous attempt caused an error. Try clearing the files and attempting again. If that results in the error again can you post the commands used |
Just deleted my mamba environment and installed again from scratch and got the same result. Can you try again with gtdb version 207, which is what I was downloading? The commands I ran were as follows: # After pulling git repo etc.
make create_environment
mamba activate autometa
make install
autometa-config --section databases --option gtdb --value ~/Downloads/databases
autometa-config --section gtdb --option release --value 207
autometa-update-databases --update-gtdb |
Running your exact code on the server now but will take time to download. Did you clear the files from |
Yes, I did. |
The code and documentation was written for v220 or higher because GTDB changed the file contents somewhere between 207 and 220. |
OK, it works now. However, because we are limited to the latest release I can't test the ability to update in-place. We will have to test that later. |
No description provided.