You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (counting only informative reads out of the total reads) for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions for alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Samp1 Samp2
chr1 939398 . GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA G,GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA 5075.01 . . GT:AD:AF:DP 0/0:73,0,0:.:35 0/2:36,2,50:0.023,0.568:88
For the first sample Samp1, the AF field in FORMAT column is missing(.).
After bcftools norm -m -any -f [reference], I've got:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (counting only informative reads out of the total reads) for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions for alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##contig=<ID=chr1>
##bcftools_normVersion=1.16+htslib-1.16
##bcftools_normCommand=norm -m -any -f /media/NFS/ref/b38/Homo_sapiens_assembly38.fasta -O z -o demo1.norm.vcf.gz demo1.vcf.gz; Date=Tue Nov 15 15:58:25 2022
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Samp1 Samp2
chr1 939398 . GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA G 5075.01 . . GT:AD:AF:DP 0/0:73,0:.:35 0/0:36,2:0.023:88
chr1 939398 . G GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA 5075.01 . . GT:AD:AF:DP 0/0:73,0::35 0/1:36,50:0.568:88
Samp2 output is as I expected.
But for Samp1, I expected that both lines should have missing value (.) for AF
(its value was missing before split, thus it makes sense to have missing values for both lines after split).
The --force option didn't make any difference, here.
However, when I ran the same command with only Samp1, I got results as I expected:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (counting only informative reads out of the total reads) for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions for alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##contig=<ID=chr1>
##bcftools_normVersion=1.16+htslib-1.16
##bcftools_normCommand=norm -m -any -f /media/NFS/ref/b38/Homo_sapiens_assembly38.fasta -O z -o demo2.norm.vcf.gz demo2.vcf.gz; Date=Tue Nov 15 15:58:32 2022
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Samp1
chr1 939398 . GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA G 5075.01 . . GT:AD:AF:DP 0/0:73,0:.:35
chr1 939398 . G GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA 5075.01 . . GT:AD:AF:DP 0/0:73,0:.:35
I've experimented with various inputs, and concluded that the issue happens only when the field-to-be-split is missing for some samples. I had no problem when all samples had values or when all samples were missing.
Thank you,
In-Hee Lee
The text was updated successfully, but these errors were encountered:
Hi,
I have a VCF file with the following line:
For the first sample
Samp1
, theAF
field inFORMAT
column is missing(.
).After
bcftools norm -m -any -f [reference]
, I've got:Samp2
output is as I expected.But for
Samp1
, I expected that both lines should have missing value (.
) forAF
(its value was missing before split, thus it makes sense to have missing values for both lines after split).
The
--force
option didn't make any difference, here.However, when I ran the same command with only
Samp1
, I got results as I expected:I've experimented with various inputs, and concluded that the issue happens only when the field-to-be-split is missing for some samples. I had no problem when all samples had values or when all samples were missing.
Thank you,
In-Hee Lee
The text was updated successfully, but these errors were encountered: