Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

CSQ sample annotation error downstream of stop codon #1578

Closed
katiesevans opened this issue Sep 14, 2021 · 1 comment
Closed

CSQ sample annotation error downstream of stop codon #1578

katiesevans opened this issue Sep 14, 2021 · 1 comment
Labels

Comments

@katiesevans
Copy link

After annotating my VCF with CSQ (bcftools 1.12), I noticed a variant (I:4464670) that was annotated as *missense for sample JU2106 but I couldn't find a previous stop codon in this sample. It seems that a stop codon (I:4463998) in another sample (ECA2091) is causing this mis-annotation. If I remove ECA2091 from the VCF, JU2106 is now labeled missense at this position. Similarly, if I leave all samples in but remove the stop codon in ECA2091, JU2106 is again labeled properly as missense. See code below for a reproducible example.

#### download vcf, ref, and gff
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/eri6.vcf.gz
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/chrI.fa.gz
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/csq.gff3.gz

# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
             --gff-annot csq.gff3.gz\
             --ncsq 224 \
             --phase a eri6.vcf.gz > test.bcsq.vcf.gz

# extract
bcftools view -e 'INFO/BCSQ="."' -Ou test.bcsq.vcf.gz  | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > test.bcsq.samples.tsv


# look at annotation
cat test.bcsq.samples.tsv | grep 4464670 # missense variant
cat test.bcsq.samples.tsv | grep 4463998 # stop

#################################################
# remove ECA2091 with stop codon
# make vcf
bcftools view -s JU2106,CB4856,MY10 eri6.vcf.gz > eri6_noECA2091.vcf

# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
             --gff-annot csq.gff3.gz\
             --ncsq 224 \
             --phase a eri6_noECA2091.vcf > noECA2091.bcsq.vcf.gz

# extract
bcftools view -e 'INFO/BCSQ="."' -Ou noECA2091.bcsq.vcf.gz  | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > noECA2091.bcsq.samples.tsv


# look at annotation
cat noECA2091.bcsq.samples.tsv | grep 4464670 # missense variant
cat noECA2091.bcsq.samples.tsv | grep 4463998 # stop

###########################################################
# remove stop variant but keep ECA2091
# make vcf
bcftools view -t ^I:4463998 eri6.vcf.gz > eri6_nostop.vcf

# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
             --gff-annot csq.gff3.gz\
             --ncsq 224 \
             --phase a eri6_nostop.vcf > nostop.bcsq.vcf.gz

# extract
bcftools view -e 'INFO/BCSQ="."' -Ou nostop.bcsq.vcf.gz  | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > nostop.bcsq.samples.tsv


# look at annotation
cat nostop.bcsq.samples.tsv | grep 4464670 # missense variant
cat nostop.bcsq.samples.tsv | grep 4463998 # stop
@pd3 pd3 added the bug label Sep 17, 2021
pd3 added a commit that referenced this issue Sep 17, 2021
The upstream stop could be falsely assigned to all samples in
a multi-sample VCF even if the stop was relevant for a single sample
only.

Fixs #1578
@pd3
Copy link
Member

pd3 commented Sep 17, 2021

This is now fixed. Thank you for the bug report and the test case!

@pd3 pd3 closed this as completed Sep 21, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants