-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Inconsistency in MergeBamAlignment #1569
Comments
This is an example of how the aligned reads look in IGV. The top image is the new behavior, with less clipping, while the bottom has more clipping. As one can see, it seems that (at least in this case) the alignment itself is suspect since the top and bottom reads disagree with each other strongly outside the middle 47 bases. |
@yfarjoun this comment is mostly a review from our offline discussion, adding here for transparency/documentation. Personally, if not for agreement with past behavior, I don't think there is a clear "better" choice between the "old" and "new" behavior of MBA. I think it is relatively clear that in both of your examples the old behavior is soft clipping bases which are not adapter readthrough. This is clear in the first example because the clipped bases are different from the bases in the mate, and in the second example because the clipped bases agree with the reference. As you have mentioned though, the old behavior leads to a more consistent agreement between TLEN and the cigar, avoiding situations where we have more matched bases in a read than the supposed length of the template. This seems like a desirable feature of the old behavior. However, I think the situation changes when we are hard clipping. While it seems acceptable to soft clip messy reads when we aren't quite sure what is going on, I think we need to be more conservative with hard clipping. Even though we store the clipped bases in a tag, that's really only helpful if someone notices a problem; I highly doubt any tool will be automatically checking those tags to see if we may have hard clipped some bases we shouldn't have. Both of the examples you provided above are situations where I wouldn't feel great about hard clipping all the bases that the old version of MBA soft clips. But even more concerning is a situation where the soft clip at the 5' end of the mate is due to a fairly large indel near the end of the read. An assembly based caller will be able to resolve this indel given evidence from other spanning reads, and these reads will provide additional evidence for the indel. I would not want to hard clip these bases off the 3' end of the read, and lose that additional evidence. So, a proposed behavior for MBA:
The result of this behavior is that:
|
A recent refactor in htsjdk and the changes in MergeBamAlignment that took advantage of said refactoring #1484, #1509, and #1513, result in occassionally inconsistent bam records:
for example:
Which has a read-length that is based on the aligned positions and clipping that is looking at the unclipped read start (note the 3S in the MateCigar)
While this change in behavior was not intentional, it does raise the question, which is the more correct approach?
Since I think that there should be only one interpretation for the size of the insert, and that interpretation should be used for both TLEN and clipping overlapping reads, and I do not wish to change the definition of TLEN, I think that the right thing to do would be to revert the behaviour of MBA back to the original, i.e. use the aligned 5' ends to determine the ends of the Insert, but I'm open to hearing other arguments and options.
Instructions
_
) as appropriate;Bug Report
Affected tool(s)
Tool name(s), special parameters?
Affected version(s)
Description
Describe the problem below. Provide screenshots , stacktrace , logs where appropriate.
Steps to reproduce
Tell us how to reproduce this issue. If possible, include command lines that reproduce the problem and provide a minimal test case.
Expected behavior
Tell us what should happen
Actual behavior
Tell us what happens instead
Feature request
Tool(s) involved
Tool name(s), special parameters?
Description
Specify whether you want a modification of an existing behavior or addition of a new capability.
Provide examples, screenshots, where appropriate.
Documentation request
Tool(s) involved
Tool name(s), parameters?
Description
Describe what needs to be added or modified.
The text was updated successfully, but these errors were encountered: