-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
using burst with NCBI nt database? #34
Comments
I'm glad you asked! The easiest way to do this is to download all the 16S
sequences accumulated by the targeted loci project (TLP) and a comparable
database for the COI (such as the
https://ftp.ncbi.nlm.nih.gov/refseq/release/mitochondrion/ database which
contains this gene (cox1) as well as all mitochondrial genes).
The 16S TLP from NCBI is found here:
ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Bacteria/bacteria.16SrRNA.fna.gz
(also available for achaea, highly recommended to get that one too in
ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Archaea/).
Then run each sequence through linfasta to linearize them (it's available
in the burst tools directory), and then run the taxonomizer programs to get
the Greengenes-like taxonomy.
Then when you run BURST in capitalist mode, you will get the LCA for each
read in column 13 and the "capitalist-picked" single match in columns 1 and
2. (Yes, "capitalist" mode does both LCA AND the capitalist disambiguation).
A guide is available here for full genomes (just replace the content with
the linearized individual targeted loci or mitochondrial genes from above).
https://github.com/knights-lab/BURST/blob/master/embalmlets/bin/Readme_utils.txt
Be sure to build the burst database with sufficiently large regions (i.e.
-d DNA 500 -s 1700) to allow the full stitched query to map.
Cheerio,
Gabe
…On Mon, Jun 7, 2021 at 5:32 AM FabianRoger ***@***.***> wrote:
I just came across the preprint and got curious to try out burst.
I need to assigning the taxonomy for Illumina short-reads (MiSeq, up to
~450bp) from an amplicon sequencing run (COI & 16S). Are there instructions
on how to format the EMBL/NCBI nt database for burst to make a
lowest-common ancestor assignment? Or is this not the intended use-case?
Thanks!
Fabian
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#34>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5NOBVQ7PSSZ73LRLXUCGDTRSG2XANCNFSM46HIGG2A>
.
|
Thanks for these helpful instructions! quick question: I didn't made it clear, but the 16S was for invertebrates, too, so I don't think the targetedloci database will cover it? And do you know if the mitochondrion database contains also partial COI genes (such as all the folmer regions from BOLD) or is that only from partial / full genomes? Either way I guess I can start with a custom reference database generated with ecoPCR from OBITools I guess and format that for BURST. Thanks again! Fabian |
I just came across the preprint and got curious to try out burst.
I need to assigning the taxonomy for Illumina short-reads (MiSeq, up to ~450bp) from an amplicon sequencing run (COI & 16S). Are there instructions on how to format the EMBL/NCBI nt database for burst to make a lowest-common ancestor assignment? Or is this not the intended use-case?
Thanks!
Fabian
The text was updated successfully, but these errors were encountered: