Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Switch to GATK PairedEndAndSplitReadEvidenceCollection for PESR collection #34

Merged
merged 5 commits into from
Aug 17, 2020

Conversation

cwhelan
Copy link
Member

@cwhelan cwhelan commented Jul 29, 2020

This switches the pipeline to using the GATK tool for PESR collection.

The GATK tool produces results that differ from svtk only in (i) the sort order of the discordant reads file -- read pairs are now sorted in sequence dictionary order and have a secondary sort on the position of the second read in the pair; and (2) changes in the spilt read file on HLA and other small alt contigs due to fixing #24.

I've tested several runs of the single sample pipeline with this change. The number of variants changes slightly from runs that used svtk, and some variants change their position, but the exact set of variants which change differ from run to run, so I'm chalking that up to non-deterministic behavior in downstream steps of the pipeline.

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good - really glad we are getting to migrate tools to GATK now. I made note of the fact that we should eventually deprecate gatk_docker_pesr_override in #8.

@cwhelan cwhelan merged commit c75756f into master Aug 17, 2020
@cwhelan cwhelan deleted the cw_gatk_pesr_collection branch August 17, 2020 18:14
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants