Skip to content

johaGL/CompoundsIdentifiers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CompoundsIdentifiers

This repository provides tables of metabolites identifiers.

The tables folder is the main folder and contains the following tables:

  • HMDB-to-KEGG_<date:MM-DD-YY>.tsv: table with HMDB identifiers matching to KEGG identifiers.
  • LipidMaps-to-KEGG_<date:MM-DD-YY>.tsv: table with LipidMaps identifiers matching to KEGG identifiers.

A nice compound


Further details on how the tables were produced

Re-producing the tables is completely optional.

Expected structure of this folder for re-producing the tables:

├── extra
│   └── <empty>     <- here the hmdb files (see "Instructions for HMDB")
├── scripts
│   ├── hmdb.py
│   ├── hmdb_ids-tokegg.py
│   └── lipidmaps_ids-tokegg.py
└── tables
    └── <empty>

The scripts in scripts folder produce the content of the tables folder:

  • hmdb_ids-tokegg.py : HMDB-to-KEGG_<date>.tsv
  • lipidmaps_ids-tokegg.py : LipidMaps-to-KEGG_<date>.tsv

The extra folder contains HMDB related data required to generate HMDB-to-KEGG_<date>.tsv; for more information see "Description of the scripts" section.

Description of the scripts

  • lipidmaps_ids-tokegg.py: searches with the API of LipidMaps database and directly generates the final table. It takes 1 minute.

  • hmdb_ids-tokegg.py and hmdb.py: Detailed information, and/or if needing to re-run it, see the Instructions for HMDB. Importantly hmdb.py is an identical copy of the fantastic script created by yufree (https://gist.github.com/yufree/f552d865096010445fc7b969e7e9d439) which is an xml parser suited for the hmdb dowloaded database; takes ~ 10 min to complete.

Instructions for HMDB

If willing to re-run the scripts to produce the final HMDB table, before anything, the following is required:

  1. go to https://hmdb.ca/downloads
  2. download "All Metabolites" dataset, it's a zip file
  3. locate the zip file into the extra/ folder
  4. unzip the file, this will produce the hmdb_metabolites.xml file, of ~6.5GB (too heavy, so not included in this repository)
  5. The two scripts must be executed in the following order:
    • a) Run hmdb.py and wait until completed (~10min); it outputs extra/hmdb.csv
    • b) Run hmdb_ids-tokegg.py, it uses the output of a); it outputs tables/hmdb_ids-tokegg.py

Note : the generated table provided in this repository used the HMDB Version 5.0 Released on 2021-11-17 downloaded on 07-May-2024 .

Help

Contact johaGL for any question

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages