Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

TODO: support viral genomes collections, e.g, MGV, GPD #11

Closed
shenwei356 opened this issue Apr 18, 2022 · 3 comments
Closed

TODO: support viral genomes collections, e.g, MGV, GPD #11

shenwei356 opened this issue Apr 18, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@shenwei356
Copy link
Owner

shenwei356 commented Apr 18, 2022

Genomes in MGV and GPD were assembled from shotgun metagenomic data (MAG). Though they are clustered into species, there are no official TaxIds available to show their relationship.

A new command similar to gtdb_to_taxdump is needed. Let TaxonKit do it!

@shenwei356
Copy link
Owner Author

GTDB taxonomy taxdump files with trackable TaxIds
https://github.com/shenwei356/gtdb-taxdump

@shenwei356
Copy link
Owner Author

MGV is also supported.

$ cat mgv_contig_info.tsv \
    | csvtk cut -t -f contig_id,votu_id,ictv_order,ictv_family,ictv_genus \
    | sed 1d \
    > mgv.tsv

$ taxonkit create-taxdump mgv.tsv --out-dir mgv --force -A 1 -S 2 -O 3 -F 4 -G 5
16:45:40.555 [WARN] --field-accession-re failed to extract genome accession, the origninal value is used instead. e.g., MGV-GENOME-0231225
16:45:40.817 [INFO] 189680 records saved to mgv/taxid.map
16:45:40.846 [INFO] 54224 records saved to mgv/nodes.dmp
16:45:40.864 [INFO] 54224 records saved to mgv/names.dmp
16:45:40.864 [INFO] 0 records saved to mgv/merged.dmp
16:45:40.864 [INFO] 0 records saved to mgv/delnodes.dmp

$ head -n 5 mgv/taxid.map 
MGV-GENOME-0364295      677052301
MGV-GENOME-0364296      677052301
MGV-GENOME-0364303      1414406025
MGV-GENOME-0364311      1849074420
MGV-GENOME-0364312      2074846424

$ echo 677052301 | taxonkit lineage --data-dir mgv/ 
677052301       Caudovirales;crAss-phage;OTU-61123

$ echo 677052301 | taxonkit reformat --data-dir mgv/ -I 1 -P
677052301       k__;p__;c__;o__Caudovirales;f__crAss-phage;g__;s__OTU-61123

$ csvtk grep -Ht -f 1 -p MGV-GENOME-0364295 mgv.tsv 
MGV-GENOME-0364295      OTU-61123       Caudovirales    crAss-phage     NULL

@shenwei356
Copy link
Owner Author

@shenwei356 shenwei356 added the enhancement New feature or request label Jun 15, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant