-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Bogart findPotentialOrphan assertion failing #1872
Comments
Without having the data it's going to be hard to diagnose/fix. Are you able to share the data if you remove the actual sequences or is that not acceptable either (we'd need the seqStore but remove all the blobs files and ovlStore)? The FAQ has instructions on sending data to us. |
I've uploaded the seqStore and ovlStore to your |
@skoren, did you get the data okay? Just wanted to follow up before I removed the tar files on my end. |
Yep I was able to get the data. |
Apologies, you caught me in the midst of many holidays and much time off. I'm looking at it now. If you have a chance, can you post the bogart command? It's in unitigging/4-unitigger/unitigger.sh. |
|
Thanks! My guesses were close. There might be a corrupt overlap file. I'm getting this error almost immediately:
Can you verify that this is correct? In particular, the above error indicates a truncated file, and file 0001-004 looks suspiciously small (it's also the file being read when it fails).
|
Looks the same on my end. I just noticed all the intermediate overlap results get removed at some point, so I can't check if something went wrong there. |
Quite surprising! That file is definitely truncated. Reads 39492 through 48674 (inclusive) have lost their overlaps. You can check yourself: There is redundancy in the way we store overlaps and this isn't as fatal as it sounds. I've hacked bogart to ignore these reads when loading overlaps. It should recover the overlaps through the redundant copies - but this has never been explicitly tested before. You might want to investigate what happened to this file between your first (crashed) run and the data you have now. I've verified that the .gz files I have are complete and unpack without error. Coincidentally, this file has the most recent time stamp of all files, hinting that a copy might have been interrupted.
(the two directories have timestamps from when I unpacked the data) |
I went back to the original ovlStore, and found there were different values.
Maybe it was during the tarring and transferring that something went off. I'm making a fresh tar of the ovlStore and will send that when ready. |
For now, just send that one file. The others looked ok. Thanks for looking! I was able to reproduce a crash with my hacked up version. |
Uploaded that |
Did you use The former will run mhap and then 'overlapPair' to remove garbage overlaps. It looks like just The trimmed reads are still good, and I suggest starting a new assembly from those: |
It was manually with |
Yes, most definitely! |
…st) works correctly. Issue #1872.
Fixed (finally). |
Hi,
I'm using canu snapshot v2.2-development +82 changes (af771ef) on a linux system, and have issues with the bogart step of unitigging. This is using about 45x ONT reads on a 2.7gb mammal genome.
An assertion is failing in
findPotentialOrphans
, which is similar to the (idle) unresolved issue in #1831. Unfortunately I can't share the sequencing data at this time, as asked for in that issue. Using the ovl algorithm was proving to be extremely slow, so I tried using mhap even for trimming and tigging, which is not recommended in the docs, but was suggest by some colleagues in the USDA. I wasn't sure if that approximate algorithm could be the cause of the failed assertion, or if there is potentially some issue upstream.I've included final lines in the unitigger.err file below.
Thanks,
Alex
PS I think there is a typo in the .trimReads.log files in the 3-overlapbasedtrimming stage, it uses NOV for the message column where the header suggests NOC for no change. It appears here.
The text was updated successfully, but these errors were encountered: