This repository provides tables of metabolites identifiers.
The tables
folder is the main folder and contains the following tables:
HMDB-to-KEGG_<date:MM-DD-YY>.tsv
: table with HMDB identifiers matching to KEGG identifiers.LipidMaps-to-KEGG_<date:MM-DD-YY>.tsv
: table with LipidMaps identifiers matching to KEGG identifiers.
Re-producing the tables is completely optional.
Expected structure of this folder for re-producing the tables:
├── extra
│ └── <empty> <- here the hmdb files (see "Instructions for HMDB")
├── scripts
│ ├── hmdb.py
│ ├── hmdb_ids-tokegg.py
│ └── lipidmaps_ids-tokegg.py
└── tables
└── <empty>
The scripts in scripts
folder produce the content of the tables
folder:
hmdb_ids-tokegg.py
:HMDB-to-KEGG_<date>.tsv
lipidmaps_ids-tokegg.py
:LipidMaps-to-KEGG_<date>.tsv
The extra
folder contains HMDB related data required to generate HMDB-to-KEGG_<date>.tsv
; for more information see "Description of the scripts" section.
-
lipidmaps_ids-tokegg.py
: searches with the API of LipidMaps database and directly generates the final table. It takes 1 minute. -
hmdb_ids-tokegg.py
andhmdb.py
: Detailed information, and/or if needing to re-run it, see the Instructions for HMDB. Importantly hmdb.py is an identical copy of the fantastic script created by yufree (https://gist.github.com/yufree/f552d865096010445fc7b969e7e9d439) which is an xml parser suited for the hmdb dowloaded database; takes ~ 10 min to complete.
If willing to re-run the scripts to produce the final HMDB table, before anything, the following is required:
- go to https://hmdb.ca/downloads
- download "All Metabolites" dataset, it's a zip file
- locate the zip file into the
extra/
folder - unzip the file, this will produce the
hmdb_metabolites.xml
file, of ~6.5GB (too heavy, so not included in this repository) - The two scripts must be executed in the following order:
- a) Run
hmdb.py
and wait until completed (~10min); it outputsextra/hmdb.csv
- b) Run
hmdb_ids-tokegg.py
, it uses the output of a); it outputstables/hmdb_ids-tokegg.py
- a) Run
Note : the generated table provided in this repository used the HMDB Version 5.0 Released on 2021-11-17
downloaded on 07-May-2024 .
Contact johaGL for any question