Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fill-tags plugin does not compute F_MISSING correctly when a custom tag is present #1684

Closed
edg1983 opened this issue Mar 18, 2022 · 2 comments

Comments

@edg1983
Copy link

edg1983 commented Mar 18, 2022

I'm using bcftools v1.15 with fill-tags plugin to annotate my VCF file.

When I annotate values using the following everything works fine:

bcftools +fill-tags test.vcf.gz --threads 4 -Ob -o test.tags.bcf -- -t AF,AC,AN,TYPE,F_MISSING,NS,HWE

However, when I add a custom tag, the F_MISSING values become wrong. For example the following command produces a wrong output for F_MISSING. All other tags are computed correctly at this seems independent from the order of tags.

bcftools +fill-tags test.vcf.gz --threads 4 -Ob -o test2.tags.bcf -- -t AF,AC,AN,TYPE,F_MISSING,NS,HWE,'F_GQ10=N_PASS(GQ >= 10)/N_SAMPLES'

Here are the outputs of the first few variants, reporting F_MISSING value and N missing genotypes. Total N samples is 33:

  1. From command1 with correct annotations
0.818182        27
0.848485        28
0.212121        7
0.121212        4
0.0909091       3
0.333333        11
0.242424        8
0.030303        1
0.575758        19
0.787879        26
  1. From command2 resulting in wrong F_MISSING
0       27
0.030303        28
0.484848        7
0.575758        4
0.757576        3
0.424242        11
0.666667        8
0.636364        1
0.212121        19
0.212121        26
@edg1983
Copy link
Author

edg1983 commented Mar 18, 2022

An update, after a quick investigation, it seems that when I have multiple custom tags, all of them get set to the value of the last one in the input. Also, F_MISSING is treated like one of these custom tags and thus the value it gets depends on the order of tags input.

For example consider these 2 commands:

bcftools +fill-tags Molisani_batch1.processed.vcf.gz --threads 4 -- -t 'F_GQ10=N_PASS(GQ >= 10)/N_SAMPLES',AF,AC,AN,TYPE,F_MISSING,'N_GQ10=N_PASS(GQ >= 10)'
bcftools +fill-tags Molisani_batch1.processed.vcf.gz --threads 4 -- -t 'F_GQ10=N_PASS(GQ >= 10)/N_SAMPLES','N_GQ10=N_PASS(GQ >= 10)',AF,AC,AN,TYPE,F_MISSING
  • The first one results in all N_GQ10, F_GQ10 and F_MISSING having the same value equal to the expected for N_GQ10.
  • The second one results in all N_GQ10, F_GQ10 and F_MISSING having the same value equal to the expected for F_MISSING

@pd3 pd3 closed this as completed in 0159b96 Mar 21, 2022
@pd3
Copy link
Member

pd3 commented Mar 21, 2022

Thank you for the bug report. This should be fixed now, please check if the fix works for you

pd3 added a commit that referenced this issue May 13, 2022
Previously the program would silently go with the last one, assigning the same values to all.

Fixes #1684
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants