-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Out of memory error #33
Comments
Hi Damian,
Happy to see you're using BURST!
BURST requires roughly 10x the size of the input fasta file in RAM when
running in DNA mode normally. (It "bursts" from about 1X the size of the FA
to ~9X right after it reads and parses it, so you won't see the RAM
gradually increase -- it's an all or nothing allocation and it checks
before allocating as you see here).
You can reduce this requirement by adding more database "partitions" using
"-dp 5" or something like that for 5 partitions. Or you could use QUICK
instead of "DNA" mode. This should not be a problem if your sequences are
reasonably distinct from each other (i.e. species or subspecies
"representative" genomes).
That said, BURST will still require at a minimum 5x the size of the input
fa to run alignments, so that would exceed your memory capacity anyway.
Best is to split the database by microbial family and run the queries
through each of them, then merge the resulting .b6 files (just concatenate
them).
Cheerio,
Gabe
…On Mon, Feb 8, 2021 at 11:06 PM Damian Kao ***@***.***> wrote:
I am trying to create a burst database on refseq genomes (134gb fa file).
The fa file is linearized and formatted with accession id in the header and
sequence next line.
>NZ_CP053296.1
GTGTCACTTTCGCTTTGGCA....
I ran this command:
./burst15 -r refseq.fa -o refseq.02082020.edx -a refseq.02082020.acx -d
DNA 320 -i 0.95 -t 64 -s 1500
This is the output:
This is BURST [v1.0 DB 15]
--> Using accelerator file refseq.02082020.acx
--> Creating DNA database (assuming max query length 320)
--> Setting identity threshold to 0.950000
--> Setting threads to 64
--> Shearing references longer than 1500
Using up to AVX-128 with 64 threads.
Parsed 25947 references.
Initiating database shearing procedure [shear 1500, ov 336].
Using compressive optimization (1 partitions; ~25947 refs).
[0] First pass: populated bin counters [782.536257]
--> Out of the 143567861906 original places, 141063566310 are eligible.
OOM:Ptrs_X
I assume OOM:ptrs_x is an out of memory error.
I am running database creation on a 500gb memory server with 64 cpus. The
memory logs shows that a maximum of 135gb of memory was used with ~360gb
memory free.
What could be the problem here? Is burst pre-calculating how much memory
it would need and killing the process before it allocates it because 360gb
free memory will not be enough?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#33>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5NOBVRZXCMUB5QQ4BOWILS6CYGDANCNFSM4XKHMKJQ>
.
|
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
I am trying to create a burst database on refseq genomes (134gb fa file). The fa file is linearized and formatted with accession id in the header and sequence next line.
I ran this command:
./burst15 -r refseq.fa -o refseq.02082020.edx -a refseq.02082020.acx -d DNA 320 -i 0.95 -t 64 -s 1500
This is the output:
I assume OOM:ptrs_x is an out of memory error.
I am running database creation on a 500gb memory server with 64 cpus. The memory logs shows that a maximum of 135gb of memory was used with ~360gb memory free.
What could be the problem here? Is burst pre-calculating how much memory it would need and killing the process before it allocates it because 360gb free memory will not be enough?
The text was updated successfully, but these errors were encountered: