Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

setGT plugin does not behave consistently when filtering with read depth (DP) #1607

Closed
PeterHolderrieth opened this issue Nov 4, 2021 · 1 comment

Comments

@PeterHolderrieth
Copy link

PeterHolderrieth commented Nov 4, 2021

I have a vcf file where both the genotype (GT) and the read depth (DP) is given for each sample. I would like to set all genotypes where the read depth is below x (DP<x) to missing (GT-->'.'). For that, I use the bcftools +setGT pluging. However, the filtering for genotypes with a low read depth does not do what it should do if I set the limit x too high.

Here is a minimal working example:
bcftools view -H minimal_vcf_file_2_samples_gt_and_dp.vcf.gz

1 1890 . A G 0.2 . . GT:DP 1/1:12 0/0:102 0/1:10 1/1:20
2 1991 . G T 0.9 . . GT:DP 1/1:8 0/1:100 0/0:11 0/0:11

bcftools +setGT minimal_vcf_file_2_samples_gt_and_dp.vcf.gz -- -t q -n . -e 'FMT/DP>=12' | bcftools view -H

Filled 8 alleles
1 1890 . A G 0.2 . . GT:DP 1/1:12 0/0:102 ./.:10 1/1:20
2 1991 . G T 0.9 . . GT:DP ./.:8 0/1:100 ./.:11 ./.:11`

That is as expected: the genotypes with read depth below 12 are set to missing.

But here is the problem:
bcftools +setGT minimal_vcf_file_2_samples_gt_and_dp.vcf.gz -- -t q -n . -e 'FMT/DP>=101' | bcftools view -H

Filled 6 alleles
1 1890 . A G 0.2 . . GT:DP ./.:12 0/0:102 ./.:10 ./.:20
2 1991 . G T 0.9 . . GT:DP 1/1:8 0/1:100 0/0:11 0/0:11

In the second row, no genotypes are filled.

But when I do:
bcftools +setGT minimal_vcf_file_2_samples_gt_and_dp.vcf.gz -- -t q -n . -e 'FMT/DP>=100' | bcftools view -H
Filled 12 alleles
1 1890 . A G 0.2 . . GT:DP ./.:12 0/0:102 ./.:10 ./.:20
2 1991 . G T 0.9 . . GT:DP ./.:8 0/1:100 ./.:11 ./.:11

It behaves as expected. Could you help me figure out what is going on?
Maybe it ignores the command if for a specific variant/row no sample is above the threshold?

My version is
bcftools --version

bcftools 1.12
Using htslib 1.12

I would appreciate your help, thank you.

UPDATE: I think the problem is that bcftools is not setting any genotypes for a row/variant if the set of excluded samples is empty. That is a problem since instead of setting genotypes for all samples, it does it for no samples. However, by replacing exclusion by an inclusion (-e to -i) and reversing the filtering condition (>= to <), the bug can be fixed: if the set of genotypes to be set is empty, not doing anything is exactly what you want the command to do:

bcftools +setGT minimal_vcf_file_2_samples_gt_and_dp.vcf.gz -- -t q -n . -i 'FMT/DP<101' | bcftools view -H

Filled 14 alleles
1 1890 . A G 0.2 . . GT:DP ./.:12 0/0:102 ./.:10 ./.:20
2 1991 . G T 0.9 . . GT:DP ./.:8 ./.:100 ./.:11 ./.:11

It might still be good to fix this bug though.

@pd3 pd3 closed this as completed in 7caa05d Nov 8, 2021
@pd3
Copy link
Member

pd3 commented Nov 8, 2021

Thank you for the bug report, this is now fixed.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants