CompoundsIdentifiers

This repository provides tables of metabolites identifiers.

The tables folder is the main folder and contains the following tables:

HMDB-to-KEGG_<date:MM-DD-YY>.tsv: table with HMDB identifiers matching to KEGG identifiers.
LipidMaps-to-KEGG_<date:MM-DD-YY>.tsv: table with LipidMaps identifiers matching to KEGG identifiers.

A nice compound

Further details on how the tables were produced

Re-producing the tables is completely optional.

Expected structure of this folder for re-producing the tables:

├── extra
│   └── <empty>     <- here the hmdb files (see "Instructions for HMDB")
├── scripts
│   ├── hmdb.py
│   ├── hmdb_ids-tokegg.py
│   └── lipidmaps_ids-tokegg.py
└── tables
    └── <empty>

The scripts in scripts folder produce the content of the tables folder:

hmdb_ids-tokegg.py : HMDB-to-KEGG_<date>.tsv
lipidmaps_ids-tokegg.py : LipidMaps-to-KEGG_<date>.tsv

The extra folder contains HMDB related data required to generate HMDB-to-KEGG_<date>.tsv; for more information see "Description of the scripts" section.

Description of the scripts

lipidmaps_ids-tokegg.py: searches with the API of LipidMaps database and directly generates the final table. It takes 1 minute.
hmdb_ids-tokegg.py and hmdb.py: Detailed information, and/or if needing to re-run it, see the Instructions for HMDB. Importantly hmdb.py is an identical copy of the fantastic script created by yufree (https://gist.github.com/yufree/f552d865096010445fc7b969e7e9d439) which is an xml parser suited for the hmdb dowloaded database; takes ~ 10 min to complete.

Instructions for HMDB

If willing to re-run the scripts to produce the final HMDB table, before anything, the following is required:

go to https://hmdb.ca/downloads
download "All Metabolites" dataset, it's a zip file
locate the zip file into the extra/ folder
unzip the file, this will produce the hmdb_metabolites.xml file, of ~6.5GB (too heavy, so not included in this repository)
The two scripts must be executed in the following order:
- a) Run hmdb.py and wait until completed (~10min); it outputs extra/hmdb.csv
- b) Run hmdb_ids-tokegg.py, it uses the output of a); it outputs tables/hmdb_ids-tokegg.py

Note : the generated table provided in this repository used the HMDB Version 5.0 Released on 2021-11-17 downloaded on 07-May-2024 .

Help

Contact johaGL for any question

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
extra		extra
scripts		scripts
tables		tables
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CompoundsIdentifiers

Further details on how the tables were produced

Description of the scripts

Instructions for HMDB

Help

About

Releases

Packages

Languages

johaGL/CompoundsIdentifiers

Folders and files

Latest commit

History

Repository files navigation

CompoundsIdentifiers

Further details on how the tables were produced

Description of the scripts

Instructions for HMDB

Help

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages