using burst with NCBI nt database? #34

FabianRoger · 2021-06-07T09:32:10Z

I just came across the preprint and got curious to try out burst.

I need to assigning the taxonomy for Illumina short-reads (MiSeq, up to ~450bp) from an amplicon sequencing run (COI & 16S). Are there instructions on how to format the EMBL/NCBI nt database for burst to make a lowest-common ancestor assignment? Or is this not the intended use-case?

Thanks!

Fabian

GabeAl · 2021-06-08T03:44:15Z

I'm glad you asked! The easiest way to do this is to download all the 16S sequences accumulated by the targeted loci project (TLP) and a comparable database for the COI (such as the https://ftp.ncbi.nlm.nih.gov/refseq/release/mitochondrion/ database which contains this gene (cox1) as well as all mitochondrial genes). The 16S TLP from NCBI is found here: ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Bacteria/bacteria.16SrRNA.fna.gz (also available for achaea, highly recommended to get that one too in ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Archaea/). Then run each sequence through linfasta to linearize them (it's available in the burst tools directory), and then run the taxonomizer programs to get the Greengenes-like taxonomy. Then when you run BURST in capitalist mode, you will get the LCA for each read in column 13 and the "capitalist-picked" single match in columns 1 and 2. (Yes, "capitalist" mode does both LCA AND the capitalist disambiguation). A guide is available here for full genomes (just replace the content with the linearized individual targeted loci or mitochondrial genes from above). https://github.com/knights-lab/BURST/blob/master/embalmlets/bin/Readme_utils.txt Be sure to build the burst database with sufficiently large regions (i.e. -d DNA 500 -s 1700) to allow the full stitched query to map. Cheerio, Gabe

…

On Mon, Jun 7, 2021 at 5:32 AM FabianRoger ***@***.***> wrote: I just came across the preprint and got curious to try out burst. I need to assigning the taxonomy for Illumina short-reads (MiSeq, up to ~450bp) from an amplicon sequencing run (COI & 16S). Are there instructions on how to format the EMBL/NCBI nt database for burst to make a lowest-common ancestor assignment? Or is this not the intended use-case? Thanks! Fabian — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#34>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5NOBVQ7PSSZ73LRLXUCGDTRSG2XANCNFSM46HIGG2A> .

FabianRoger · 2021-06-10T09:47:09Z

Thanks for these helpful instructions!

quick question: I didn't made it clear, but the 16S was for invertebrates, too, so I don't think the targetedloci database will cover it? And do you know if the mitochondrion database contains also partial COI genes (such as all the folmer regions from BOLD) or is that only from partial / full genomes?

Either way I guess I can start with a custom reference database generated with ecoPCR from OBITools I guess and format that for BURST. Thanks again!

Fabian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using burst with NCBI nt database? #34

using burst with NCBI nt database? #34

FabianRoger commented Jun 7, 2021

GabeAl commented Jun 8, 2021 via email

FabianRoger commented Jun 10, 2021

using burst with NCBI nt database? #34

using burst with NCBI nt database? #34

Comments

FabianRoger commented Jun 7, 2021

GabeAl commented Jun 8, 2021 via email

FabianRoger commented Jun 10, 2021