v2.2.3 - Fix to DI event collation in pindel core
Correct read sorting during collection of DI events. Caused some events to be split into many and others to be missed (Thanks to @liangkaiye for patch)
Testing details:
For passed variants:
Comparing sites in VCF files...
Found 277 sites common to both files.
Found 0 sites only in main file.
Found 0 sites only in second file.
Found 0 non-matching overlapping sites.
After filtering, kept 277 out of a possible 912840 Sites
For all variants:
Comparing sites in VCF files...
Found 907254 sites common to both files.
Found 2294 sites only in main file.
Found 3698 sites only in second file.
Found 3292 non-matching overlapping sites.
If you then investigate the individual classes unfiltered:
Deletions:
$ zgrep -c 'PC=D;' pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz
pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz:463740
postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz:463740
Insertions:
$ zgrep -c 'PC=I;' pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz
pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz:423638
postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz:423638
Complex:
$ zgrep -c 'PC=DI;' pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz
pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz:25462
postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz:26866
There is actually overall an increase in total complex events. This is actually not that surprising. The bad sorting of the reads could just as easily prevent an event from reaching the required threshold for reporting.