You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, BURST checks that the number of ">" characters globally is exactly twice the number of newline characters. Thus, well-formatted FASTA files can throw an error.
The text was updated successfully, but these errors were encountered:
Thanks for the report! Yes, this is a known issue. BURST also doesn't
support multi-line fasta files. Currently I would recommend linearizing the
fasta file as well as ensuring that there are no embedded ">" characters in
the filename. One way to do this is to name them something arbitrary (like
a numeric index) and downstream expand this index into the original name.
Another option would be to use "tr" or something to remove all instances of
">", then add them back to every other line (or removing all but the first
">" on every other line, etc).
I will keep this issue open as I may have time to work on it. Most systems
are I/O bottlenecked, and this was implemented as a speedup technique that
indeed causes problems in cases like this.
Cheerio,
Gabe
On Tue, Nov 26, 2019 at 3:07 PM chartl ***@***.***> wrote:
The output of gene prediction algorithms (Prokka) can include ">"
characters as part of the gene name, for instance:
>BOGOCDKJ_04879 3 beta-hydroxysteroid dehydrogenase/Delta 5-->4-isomerase
Currently, BURST checks that the number of ">" characters globally is
exactly twice the number of newline characters. Thus, well-formatted FASTA
files can throw an error.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#21?email_source=notifications&email_token=AB5NOBUVERCJSWUO2J7IVZDQVV6XLA5CNFSM4JR5GGXKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H4HCJCA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5NOBVFXKXKOBSBBCN5LBDQVV6XLANCNFSM4JR5GGXA>
.
The output of gene prediction algorithms (Prokka) can include ">" characters as part of the gene name, for instance:
>BOGOCDKJ_04879 3 beta-hydroxysteroid dehydrogenase/Delta 5-->4-isomerase
Currently, BURST checks that the number of ">" characters globally is exactly twice the number of newline characters. Thus, well-formatted FASTA files can throw an error.
The text was updated successfully, but these errors were encountered: