-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
BGZip slow performance near end of chromosomes #153
Comments
Seems that the lookup time just scales with the distance from the "start" of a contig. I just quickly scanned the internals, can't say I fully understand, but it seems that this is due to the way bgzip is implemented in biopython: https://github.com/biopython/biopython/blob/master/Bio/bgzf.py#L699 It seems to read the whole part before the contig you need...? |
@Maarten-vd-Sande this is definitely not due to the Bio.bgzf implementation and is definitely due to my incomplete implementation of virtual offset calculations from the start of each contig. I started work to fully support using the Lines 766 to 776 in f878775
You can see that I was still trying to figure out how this works, and never was able to make an entire round-trip (read a |
@mdshw5 thanks for the reply, that makes sense! I guess I'll just load the whole fasta in memory for now 😄 |
It can take over a minute to retrieve a few bases:
Low coordinates are fine:
You said in a previous issue:
I can't find that issue, so am raising this one. Good luck!
The text was updated successfully, but these errors were encountered: