-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
bcftools annotate throws unhandled assertion error in some cases #1957
Comments
This was caused by forcing the END annotation. The VCF record at 8914680 overlaps two annotations, at 8914680 and 8914690. The first overlap updates INFO/END and rlen (as END supersedes rlen), so when the second overlap is tested, rlen is already different and the overlap length returns negative value. This is now fixed, thank you for reporting the issue and providing a test case |
Thanks for looking into this! Do you know why it might've still worked when removing the |
Once the first matching was used and INFO/END added, any subsequent matches would be ignored. Meaning the VCF record will be checked if END exists and if it does, it will continue without modifying it for the second time |
I see, thanks! |
Its still not fixed I guess. I installed BCFTOOLS from APT-GET. I get the following error when converting BAM file to VCF File: bcftools: bam2bcf.c:421: bcf_call_glfgen: Assertion `epos>=0 && eposnpos' failed. |
Please send me the solution or a bug-free BCFTOOLS directly to abi@abioteq.net . |
I'll outline:
The Issue
I came across a curious exception when manipulating a few SV VCFs. Here's a minimal example that should demonstrate the error with the latest
bcftools annotate
.We start with the following files:
annotations.hdr :
minimal_annotations.tsv.gz :
minimal_example.vcf.gz :
Make sure you run
bgzip
onminimal_annotations.tsv
to get the compressed version, then runtabix -s1 -b2 -e3 minimal_annotations.tsv.gz
.Finally, the command to produce the error is now:
The error output is:
Even stranger: I tried doing my original analysis with
bcftools
1.14 and it seems to have worked fine. The above was done with version 1.17.Debugging
I've tracked the problem down to this line here. There is an assertion made that the value
isec
is positive, and in this case it turns out not to be. What's confusing to me is why this number ends up negative. I ended up editing the code around there to be:Rebuilding/rerunning gave the output:
Note that START and END are computed using the
annotations
file whereas the POS and RLEN are computed from the VCF.Computing the expression for
isec
, we see thattranslates in math to
So with our above values, it's negative because the annotations region is too far to the right of the variant position, making the left term POS + 1 and the right term START.
Conceptually, this expression should always be > 0 if the actual entries in
annotations.tsv.gz
get applied to the correct variants that correspond to that region. But here it seems that the 3rd line ofannotations
ends up getting applied to the second variant, causing this problem. This seems incredibly bizarre to me since I've run this exact sequence of commands on around 50 samples over the whole genome, and only a couple had this rare exception. Also, maybe I don't understand how RLEN gets computed, but it seems incorrect for this example to be2
rather thanLEN(tattccattccattcc)
.Possible Fix?
Commenting out
assert( isec > 0 )
lets the program run fine and seems to output the correct things. In fact, strangely enough, after removing this line it adds the correctINFO/END
fields to each of the three entries. In particular, the second variant getsEND=8914681
and NOT 8914691 as one might expect from the above error that seemed to match up the third annotation with the second variant. I have no idea why this would happen and would appreciate some insight.In either case, it's probably best to make sure this assertion is handled properly so the user can have a more helpful error message to debug.
The text was updated successfully, but these errors were encountered: