-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Harmonize indel alignment between bwa-mem and STAR alignments for the purpose of variant allele counting #176
Comments
Possible options to fix this:
|
For option 2. Suggestions from ChatGPT on how to do the BAM realignment approach: bamleftalign (From FreeBayes)
GATK LeftAlignIndels
Its likely that BOTH of these approaches may NOT work correctly with an RNA BAM unfortunately because of lack of support for spliced alignments (N cigar operator). However, it might work if the indels are within exons, but not across splice sites. This would cover most cases. A conservative approach to test this might be to create this BAM only for the purposes of read counting variant in the RNA but not use it for any other downstream steps. We could then evaluate its impact on the RNA counting results in isolation without worrying about other downstream impacts. |
I haven't found any evidence that option 1 is possible. STAR does not seem to allow control of how reads are justified. Option 2 might work. Option 3 is a pain but would probably be the most performant (avoids reprocessing the whole BAM). On the other hand if Option 2 works, it would be simple to implement. |
Modification of the above but splitting the CIGAR strings first to see if that helps:
Note that on this example BAM running SplitNCigarReads took 628 minutes (10.5 hours!) and the 18G BAM became 45G. |
When I attempted to run the LeftAlignIndels step I got the following error:
It sounds like this could be a known bug in this version of GATK (I was using: v4.1.8.1) It may work in v4.1.4.1. Or perhaps a version after 4.1.7? Try a more recent version of GATK. |
Further attempt with latest version of GATK:
Note that on this example BAM running SplitNCigarReads took 437 minutes (7.3 hours!) and the 18G BAM became 38G. |
Currently the immuno pipeline produce bwa-mem alignments of DNA sequence reads where indels are left aligned and STAR alignments of RNA sequence reads where indels are right aligned.
This means that when counting variant allele supporting reads and calculating VAFs, the result is often incorrect for RNA VAF. e.g. a single base deletion is reported with a good DNA VAF but the RNA VAF is reported as 0. Upon manual review the variant is confirmed as expressed in the RNA.
The following screenshot illustrates the situation.
The text was updated successfully, but these errors were encountered: