Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

-i and -e are not complement when I use query subcommand #1783

Closed
takosa opened this issue Aug 29, 2022 · 6 comments
Closed

-i and -e are not complement when I use query subcommand #1783

takosa opened this issue Aug 29, 2022 · 6 comments

Comments

@takosa
Copy link

takosa commented Aug 29, 2022

I noticed the result of

bcftools query -i 'FMT/DP>=20' -f '[%CHROM %POS %SAMPLE\n]' input.vcf

and

bcftools query -e 'FMT/DP<20' -f '[%CHROM %POS %SAMPLE\n]' input.vcf

are not complement.

This occur if only I specified %SAMPLE but %FORMAT tags, so I can avoid it by specifying other %FORMAT tags but it is a little inconvenient and confusing.

I think this is because the process is skipped by the code shown below when I use -e and the condition is met in some samples.
If only %SAMPLE is specified, not %FMT/DP, etc., the flag BCF_UN_FMT is not set. When I use -i, this checking and skipping do not occur.

if ( !(max_convert_unpack & BCF_UN_FMT) ) continue;

The simple solutions I think is to omit the above code or to change the code below to set the flag.

case T_SAMPLE: fmt->handler = &process_sample; break;

However, I'm not sure how this change will affect the rest of the code.

Thank you,

@pd3
Copy link
Member

pd3 commented Sep 8, 2022

Can you provide a specific example to demonstrate the behavior please?

@takosa
Copy link
Author

takosa commented Sep 9, 2022

I explain the issue by using this example vcf file named input.vcf.

##fileformat=VCFv4.1
##contig=<ID=1,assembly=test,length=100>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample1	sample2
1	10	snp1	G	C	.	.	.	GT:DP	0/0:10	0/1:20
1	20	snp2	G	C	.	.	.	GT:DP	0/0:10	0/1:20
1	30	snp3	G	C	.	.	.	GT:DP	0/0:10	0/1:20

I ran bcftools query command and this is the outputs.

$ bcftools query -i 'FMT/DP>=20' -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample2
snp2 sample2
snp3 sample2
$ bcftools query -e 'FMT/DP<20'  -f '[%ID %SAMPLE\n]' input.vcf # no outputs
$ bcftools query -i 'FMT/DP<20'  -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample1
snp2 sample1
snp3 sample1
$ bcftools query -e 'FMT/DP>=20'  -f '[%ID %SAMPLE\n]' input.vcf # no outputs

However, I think the outputs should be like this.

$ bcftools query -i 'FMT/DP>=20' -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample2
snp2 sample2
snp3 sample2
$ bcftools query -e 'FMT/DP<20'  -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample2
snp2 sample2
snp3 sample2
$ bcftools query -i 'FMT/DP<20'  -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample1
snp2 sample1
snp3 sample1
$ bcftools query -e 'FMT/DP>=20'  -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample1
snp2 sample1
snp3 sample1

In addition, if I add DP tag in my query, the outputs is like this. The number of rows are changed even though I don't change the filtering option.

$ bcftools query -e 'FMT/DP<20'  -f '[%ID %SAMPLE %DP\n]' input.vcf
snp1 sample2 20
snp2 sample2 20
snp3 sample2 20

Thank you for your consideration.

@pd3
Copy link
Member

pd3 commented Sep 10, 2022

I see. It's probably an older version of bcftools, there were some bugs fixed over the time. The latest gives the expected output:

$ cat rmme.vcf
##fileformat=VCFv4.1
##contig=<ID=1,assembly=test,length=100>
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	smpl10	smpl20
1	10	.	G	C	.	.	.	DP	10	20

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -i 'FMT/DP>=20' 
smpl20 20

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -e 'FMT/DP>=20' 
smpl10 10

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -e 'FMT/DP<20' 
smpl20 20

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -i 'FMT/DP<20' 
smpl10 10

@pd3 pd3 closed this as completed Sep 10, 2022
@takosa
Copy link
Author

takosa commented Sep 11, 2022

Thank you for your help.
I think I use latest version and outputs is like this.
I don't know, but something seems wrong... or Is this a specification?
(DP tag is not needed for my analysis.)

$ cat rmme.vcf
##fileformat=VCFv4.1
##contig=<ID=1,assembly=test,length=100>
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	smpl10	smpl20
1	10	.	G	C	.	.	.	DP	10	20

$ bcftools --version
bcftools 1.16-6-gfd9470e
Using htslib 1.16-10-g6366029
Copyright (C) 2022 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ bcftools query -f '[%SAMPLE\n]' rmme.vcf -i 'FMT/DP>=20'
smpl20

$ bcftools query -f '[%SAMPLE\n]' rmme.vcf -e 'FMT/DP>=20'

$ bcftools query -f '[%SAMPLE\n]' rmme.vcf -e 'FMT/DP<20'

$ bcftools query -f '[%SAMPLE\n]' rmme.vcf -i 'FMT/DP<20'
smpl10

@pd3 pd3 reopened this Sep 12, 2022
@pd3
Copy link
Member

pd3 commented Sep 12, 2022

I was able to reproduce the problem now

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -e 'FMT/DP>=20'
smpl10 10
$ bcftools query -f '[ %SAMPLE\n]' rmme.vcf -e 'FMT/DP>=20'

@pd3 pd3 closed this as completed in f4dee4b Sep 12, 2022
@pd3
Copy link
Member

pd3 commented Sep 12, 2022

I believe the bug is now fixed by f4dee4b

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants