Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

mapping performance improvement #773

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

alexey0308
Copy link
Contributor

During mapping, stitchWindowAligns uses recursive calls

  1. one using the current alignment iA
  2. one not using.

The call 1) is performed only if stitchAlignToTranscript does not return a bad score.
Since stitchAlignToTranscript modifies the Transcript object, an additional copy is created, which is expansive.

This pull request attempts to address this performance overhead:
additional cheaper check is performed, which allows to avoid copy-creation of a new Transcript object if it will not be used in the recursion.

The changes comprise of

  • adding const modifiers
  • extracting a function for transcript finalization (no code modified)
  • new function for checking if this alignment can be skipped, used here
  • formatting, e.g. long lines of code are split.

Progress output with and without the PR code on a test subset:

           Time    Speed        Read     Read   Mapped   Mapped   Mapped   Mapped Unmapped Unmapped Unmapped Unmapped
                    M/hr      number   length   unique   length   MMrate    multi   multi+       MM    short    other
Nov 02 13:11:28    117.1     2113669      101    94.3%    100.4     0.6%     2.5%     0.1%     0.0%     3.0%     0.0%
ALL DONE!
           Time    Speed        Read     Read   Mapped   Mapped   Mapped   Mapped Unmapped Unmapped Unmapped Unmapped
                    M/hr      number   length   unique   length   MMrate    multi   multi+       MM    short    other
Nov 02 13:13:09     49.5     1168327      101    94.3%    100.4     0.6%     2.6%     0.1%     0.0%     3.0%     0.0%
ALL DONE!

* [stitchWindowAligns()] used to create a new [Transcript] instance
irregardless of if it will be successfully stitched to the transcript.
Early check allows to check for most common reasons to reject
the alignment.

* To ease reading, `WA[iA]` is aliased as `align`.

* minor formatting
- const char* R
- const uiWA* WA
@alexdobin
Copy link
Owner

Hi Alexey,

thanks a lot for this very interesting PR (and the previous as well).
Sorry for the belayed reply, I am extremely swamped this week and the next one with the Genome Informatics conference. The 2-fold speed-up sounds really big!
Let me do a few checks, and then I will pull it in, next week or the one after.

Cheers
Alex

@rherai
Copy link

rherai commented Nov 19, 2019

Does anybody knows whether using a gtf annotation + Star + human genome I could improve alignment speed? If yes, how many times I can speed up the alignment?

@alexdobin
Copy link
Owner

Hi @rherai

using annotations increases accuracy but not the speed.

Cheers
Alex

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants