Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add option to hard clip read in CigarUtils (replacement for #1453) #1461

Merged
merged 13 commits into from
Mar 6, 2020

Conversation

kachulis
Copy link
Contributor

@kachulis kachulis commented Mar 4, 2020

@lbergelson here is the replacement PR for #1453 we discussed earlier today.

@fleharty, @lbergelson and I talked about how to get hard clipping merged ASAP, given that I do not have permissions to push directly to htsjdk, and so cannot make modifications directly to your branch. He suggested opening up a new PR to replace the old one, which is what this PR is.

@codecov-io
Copy link

codecov-io commented Mar 4, 2020

Codecov Report

Merging #1461 into master will increase coverage by 0.787%.
The diff coverage is 73.585%.

@@              Coverage Diff               @@
##             master     #1461       +/-   ##
==============================================
+ Coverage     68.41%   69.197%   +0.787%     
- Complexity     8499      8700      +201     
==============================================
  Files           583       587        +4     
  Lines         34413     34571      +158     
  Branches       5729      5776       +47     
==============================================
+ Hits          23542     23922      +380     
+ Misses         8650      8370      -280     
- Partials       2221      2279       +58
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/util/CigarUtil.java 53.571% <73.585%> (+23.268%) 4 <2> (-10) ⬇️
...n/java/htsjdk/samtools/RandomAccessFileBuffer.java 0% <0%> (-71.667%) 0% <0%> (-10%)
src/main/java/htsjdk/samtools/CRAMBAIIndexer.java 71.711% <0%> (-9.727%) 12% <0%> (-5%)
...m/encoding/core/SubexponentialIntegerEncoding.java 85.714% <0%> (-8.73%) 5% <0%> (+1%)
...c/main/java/htsjdk/samtools/cram/build/CramIO.java 67.347% <0%> (-7.996%) 8% <0%> (-12%)
src/main/java/htsjdk/samtools/CRAMCRAIIndexer.java 74.51% <0%> (-7.308%) 10% <0%> (ø)
...tools/cram/encoding/core/GammaIntegerEncoding.java 93.333% <0%> (-6.667%) 5% <0%> (+1%)
...ava/htsjdk/samtools/cram/structure/EncodingID.java 93.75% <0%> (-6.25%) 6% <0%> (+3%)
src/main/java/htsjdk/samtools/Bin.java 77.551% <0%> (-6.122%) 17% <0%> (-1%)
.../samtools/cram/encoding/readfeatures/SoftClip.java 50% <0%> (-4.545%) 6% <0%> (-1%)
... and 162 more

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kachulis a few comments. Do we want to thing about invalidating tags as part of the clipping operation? Things like NM will not be correct anymore.

final int clipFrom, final String expectedReadString, final String expectedCigar) throws IOException {

final SAMRecord rec = createTestSamRec(initialReadString, initialCigar, negativeStrand);
// Assert.assertEquals(rec.getCigarString(), initialCigar, testName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either delete this or uncomment it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was already removed in commit 696541f

* @param clippingOperator clipping operator to be merged
* @param trailingHardClippedBases number of hardClippedBases which were on the end of the original cigar
*/
static private void mergeClippingCigarElement(List<CigarElement> newCigar, CigarElement c,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename some of the parameters and variables here. They're confusing as hell.

Maybe:
c -> originalElement
op -> originalOperator
clippingOperator -> newClippingOperator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -180,11 +180,14 @@ public static void clip3PrimeEndOfRead(SAMRecord rec, final int clipFrom, final

// If hard-clipping, remove the hard-clipped bases from the read
if(clippingOperator == CigarOperator.HARD_CLIP) {
byte[] bases = rec.getReadBases();
final byte[] bases = rec.getReadBases();
final byte[] baseQualities = rec.getBaseQualities();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we miss this on the first pass?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make sure bases.length and baseQualities.length are the same, and then use the same values for the start/stop for each.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kachulis
Copy link
Contributor Author

kachulis commented Mar 5, 2020

Thanks for the quick review @lbergelson. I have added code to invalidate NM, MD, and UQ tags when clipping changes the length of the read. Note this will change behavior (to be more correct) for soft-clipping as well. Also added more validation of the things that are changes to clip3PrimeEndOfRead. Back to you!

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kachulis Two comment. I think we're good after that. Clipping always has more weird edges though...

@@ -201,6 +212,13 @@ public static void clip3PrimeEndOfRead(SAMRecord rec, final int clipFrom, final
}
}

if (rec.getReadLength() != originalReadLength) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a method comment saying that this will remove these tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -201,6 +212,13 @@ public static void clip3PrimeEndOfRead(SAMRecord rec, final int clipFrom, final
}
}

if (rec.getReadLength() != originalReadLength) {
//invalidate NM, UQ, MD tags if we have changed the length of the read.
rec.setAttribute(SAMTag.NM.name(), null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test that shows these get invalidated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kachulis
Copy link
Contributor Author

kachulis commented Mar 6, 2020

@lbergelson I added documentation and tests for the tag invalidations. Also adjusted the logic to only invalidate in the number of reference bases the read aligns to changes (no need to invalidate these tags if we are harclipping bases that were already softclipped).

I don't think I have merge permissions, so I'm happy with you merging this at this point if you approve.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems like a good improvement

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants