Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

bcftools annotate throws unhandled assertion error in some cases #1957

Closed
rickymagner opened this issue Jul 7, 2023 · 6 comments
Closed

Comments

@rickymagner
Copy link

I'll outline:

  • The Issue
  • Debugging
  • Possible Fix

The Issue

I came across a curious exception when manipulating a few SV VCFs. Here's a minimal example that should demonstrate the error with the latest bcftools annotate.

We start with the following files:
annotations.hdr :

##INFO=<ID=END,Number=1,Type=Integer,Description="End coordinate in reference for SV">

minimal_annotations.tsv.gz :

chr21	8914240	8914240	8914295
chr21	8914680	8914680	8914681
chr21	8914690	8914690	8914691

minimal_example.vcf.gz :

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=248387328>
##contig=<ID=chr2,length=242696752>
##contig=<ID=chr3,length=201105948>
##contig=<ID=chr4,length=193574945>
##contig=<ID=chr5,length=182045439>
##contig=<ID=chr6,length=172126628>
##contig=<ID=chr7,length=160567428>
##contig=<ID=chr8,length=146259331>
##contig=<ID=chr9,length=150617247>
##contig=<ID=chr10,length=134758134>
##contig=<ID=chr11,length=135127769>
##contig=<ID=chr12,length=133324548>
##contig=<ID=chr13,length=113566686>
##contig=<ID=chr14,length=101161492>
##contig=<ID=chr15,length=99753195>
##contig=<ID=chr16,length=96330374>
##contig=<ID=chr17,length=84276897>
##contig=<ID=chr18,length=80542538>
##contig=<ID=chr19,length=61707364>
##contig=<ID=chr20,length=66210255>
##contig=<ID=chr21,length=45090682>
##contig=<ID=chr22,length=51324926>
##contig=<ID=chrX,length=154259566>
##contig=<ID=chrY,length=62460029>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FILTER=<ID=HET1,Description="Heterozygous in the first haplotype">
##FILTER=<ID=HET2,Description="Heterozygous in the second haplotype">
##FILTER=<ID=GAP1,Description="Uncalled in the first haplotype">
##FILTER=<ID=GAP2,Description="Uncalled in the second haplotype">
##FILTER=<ID=DIPX,Description="Diploid chrX in non-PAR">
##FILTER=<ID=DIPY,Description="Diploid chrY in non-PAR">
##bcftools_normVersion=1.16+htslib-1.16
##bcftools_normCommand=norm -m -any -o split.vcf.gz /cromwell_root/fc-37e70ab7-6023-490d-824d-5997ff71fbb3/submissions/fcc4a6da-c981-4dd9-a736-06f23f1e0af0/runDipcall/21a8a4fa-67db-406b-ba61-ae6a7aafa8e9/call-IndexVCF/cleaned-indexed.vcf.gz; Date=Thu Jul  6 18:40:20 2023
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="SVTYPE">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SVLEN">
##bcftools_viewVersion=1.17+htslib-1.17
##bcftools_viewCommand=view -i INFO/SVTYPE!="." -o truvari-SVs.vcf.gz truvari-annotated.vcf.gz; Date=Fri Jul  7 10:17:14 2023
##bcftools_viewCommand=view --regions-file bad_region.bed -o minimal_example.vcf.gz truvari-SVs.vcf.gz; Date=Fri Jul  7 18:11:13 2023
##bcftools_viewCommand=view minimal_example.vcf.gz; Date=Fri Jul  7 18:26:30 2023
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	syndip
chr21	8914240	.	gttccattccattccattcaattccattccattgcattccattccattccattcca	G	30	HET2	SVTYPE=DEL;SVLEN=55	GT:AD	0|.:3,1
chr21	8914680	.	tattccattccattcc	TATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCTAGTTGATTCCATTCCATTCCATCCCGTTCCATTCCATTCCGTTACTTTCTATTCCATTCCATTCCATTCC	30	HET2	SVTYPE=INS;SVLEN=100	GT:AD	0|.:1,1
chr21	8914690	.	c	CATTCCATTCCATTCCATTCCATTCTAGTTGATTCCATTCCATTCCATCCCGTTCCATTCCATTCCGTTACTTTCT	30	HET2	SVTYPE=INS;SVLEN=75	GT:AD	0|.:2,1

Make sure you run bgzip on minimal_annotations.tsv to get the compressed version, then run tabix -s1 -b2 -e3 minimal_annotations.tsv.gz.

Finally, the command to produce the error is now:

bcftools annotate -a minimal_annotations.tsv.gz -h annotations.hdr -c CHROM,FROM,TO,INFO/END -o minimal_annotated.vcf.gz minimal_example.vcf.gz

The error output is:

Assertion failed: (isec > 0), function annotate, file vcfannotate.c, line 3086.
[1]    98724 abort      bcftools annotate -a minimal_annotations.tsv.gz -h annotations.hdr -c  -o

Even stranger: I tried doing my original analysis with bcftools 1.14 and it seems to have worked fine. The above was done with version 1.17.

Debugging

I've tracked the problem down to this line here. There is an assertion made that the value isec is positive, and in this case it turns out not to be. What's confusing to me is why this number ends up negative. I ended up editing the code around there to be:

//assert( isec > 0 );
if ( isec < 1 )
    error("Problem at %s, POS: %"PRId64" and RLEN: %lld\nAlso have START: %d and END: %d", bcf_seqname(args->hdr,line), line->pos, line->rlen, tmp->start, tmp->end);

Rebuilding/rerunning gave the output:

Problem at chr21, POS: 8914679 and RLEN: 2
Also have START: 8914689 and END: 8914689

Note that START and END are computed using the annotations file whereas the POS and RLEN are computed from the VCF.

Computing the expression for isec, we see that

int isec = (tmp->end < line->pos+line->rlen-1 ? tmp->end : line->pos+line->rlen-1) - (tmp->start > line->pos ? tmp->start : line->pos) + 1;

translates in math to

isec = MIN(END, POS+RLEN-1) - MAX(START, POS) + 1

So with our above values, it's negative because the annotations region is too far to the right of the variant position, making the left term POS + 1 and the right term START.

Conceptually, this expression should always be > 0 if the actual entries in annotations.tsv.gz get applied to the correct variants that correspond to that region. But here it seems that the 3rd line of annotations ends up getting applied to the second variant, causing this problem. This seems incredibly bizarre to me since I've run this exact sequence of commands on around 50 samples over the whole genome, and only a couple had this rare exception. Also, maybe I don't understand how RLEN gets computed, but it seems incorrect for this example to be 2 rather than LEN(tattccattccattcc).

Possible Fix?

Commenting out assert( isec > 0 ) lets the program run fine and seems to output the correct things. In fact, strangely enough, after removing this line it adds the correct INFO/END fields to each of the three entries. In particular, the second variant gets END=8914681 and NOT 8914691 as one might expect from the above error that seemed to match up the third annotation with the second variant. I have no idea why this would happen and would appreciate some insight.

In either case, it's probably best to make sure this assertion is handled properly so the user can have a more helpful error message to debug.

@pd3 pd3 closed this as completed in 97cfa16 Jul 12, 2023
@pd3
Copy link
Member

pd3 commented Jul 12, 2023

This was caused by forcing the END annotation. The VCF record at 8914680 overlaps two annotations, at 8914680 and 8914690. The first overlap updates INFO/END and rlen (as END supersedes rlen), so when the second overlap is tested, rlen is already different and the overlap length returns negative value.

This is now fixed, thank you for reporting the issue and providing a test case

@rickymagner
Copy link
Author

Thanks for looking into this! Do you know why it might've still worked when removing the assert statement? It seemed to produce the correct annotations still when just commenting it out.

@pd3
Copy link
Member

pd3 commented Jul 12, 2023

Once the first matching was used and INFO/END added, any subsequent matches would be ignored. Meaning the VCF record will be checked if END exists and if it does, it will continue without modifying it for the second time

@rickymagner
Copy link
Author

I see, thanks!

@abivatsa
Copy link

Its still not fixed I guess. I installed BCFTOOLS from APT-GET. I get the following error when converting BAM file to VCF File:

bcftools: bam2bcf.c:421: bcf_call_glfgen: Assertion `epos>=0 && eposnpos' failed.

@abivatsa
Copy link

Please send me the solution or a bug-free BCFTOOLS directly to abi@abioteq.net .

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants