-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Crashes in unitigging, out of memory (?) #1355
Comments
Wow!! This line
is saying you have a 10 Kbp contig with 393,906 reads in it, which seems suspicious. Grab some reads from there and see if they make sense:
If they look bogus, I'd suggest filtering them from trimmedReads.fasta then restarting a new assembly from there ( Another option, non-intuitive as it sounds, is to decrease the memory limit. This will cause the lower quality and/or short repeat overlaps to be skipped. Something like 90 GB is a good first try. The start of unitigger.err has a table of how many overlaps per read it is loading:
With 40x coverage, 700 overlaps per read is a decent value to target. Save the 4-unitigger directory from this run, we can compare contig sizes to decide if the reduced memory is impacting assembly -- the various *.sizes files in there show contig sizes at various steps in the algorithm. |
Ah, ok. That line did give me a pause, but then I remembered that I don't understand how genomics works and shrugged. I might try your suggestion to reduce memory and see if I can filter the contig rather than the reads. I suspect a bit that there will be other contigs like this in the assembly as it's a bit pathological, and I'm actually a bit low on coverage due to tetraploidy. I'll see what happens and get back to you- Thanks for quick response! |
Weirdly, reducing the memory worked. canu finished and I have an assembly. I'm still investigating but it looks similar to what wtdbg2 gave me in terms of size and contiguity. I ran the 90g run initially but stupidly overwrote it with an experimental 150g run, which also worked (see unitigger.err in archive). I am also uploading the (I'll quickly note also that the previous |
Based on that deep contig, I'd guess there is a 10 Kbp (tandem?) repeat in this genome that is preventing any better assembly. If so, the ends of contigs should have pieces of the repeat sequence, and the unitig graph should be very very connected. The logs show 20x coverage in corrected, and I'd guess you have about 25x in raw reads. More data could possibly help, both in better corrected reads and more chance of spanning a repeat. |
I am having a job crash in bogart on what I suspect is a memory error, but I don't know for sure
I've gone through many iterations of this, including these:
canu-1.7/Linux-amd64/bin/canu -assemble maxMemory=900g maxThreads=4 utgovlMemory=225g utgovlThreads=4 -d canu_data/tetra_canu -p tetra_canu genomeSize=1.3g batMemory=225g -pacbio-corrected canu_data/tetra_canu/trimmedReads.fasta.gz
canu-1.7/Linux-amd64/bin/canu -assemble maxMemory=300g maxThreads=16 utgovlMemory=200g utgovlThreads=8 -d canu_data/tetra_canu -p tetra_canu genomeSize=1.3g -pacbio-corrected canu_data/tetra_canu/trimmedReads.fasta.gz
All of the various parameter combinations I've tried result in the exact same output (other than differences from CLI parameters, e.g. thread number), even after removing the 4-unitigger/ dir.
I am running this on Amazon Linux.
I am attaching a unitigger.err file, which shows as much as I can tell about the error message here:
This looks to me like some sort of allocation error, but I'm not enough of an expert to get much from going through the source code.
unitigger.log just has these lines, which I think might be a downstream phenomenon:
../tetra_canu.ctgStore/ does not exist.
tetra_canu.005.mergeOrphans.thr004.num000.log, the only mergeOrphans log file in evidence, has a bunch of innocuous-looking lines:
I have increased available memory quite substantially (using ~1TB) and restarted the run multiple times with different thread/memory arguments but I think I'm misunderstanding something.
Do you have any suggestions?
Many thanks, max
The text was updated successfully, but these errors were encountered: