Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Optimize time and space consumption. #85

Open
thequicksort opened this issue Feb 6, 2021 · 2 comments
Open

Optimize time and space consumption. #85

thequicksort opened this issue Feb 6, 2021 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@thequicksort
Copy link
Contributor

Brain storming ideas for optimizations post-release. Please add them as you think of them.

  • (Katie mentioned): Removing Random Forest classifier

  • Use Python slots in dataclasses without defaults (will reduce time and space).

@thequicksort thequicksort added the enhancement New feature or request label Feb 6, 2021
@thequicksort thequicksort added this to the 1.1 milestone Feb 6, 2021
@thequicksort
Copy link
Contributor Author

It turns out that ONT released some documentation around many of its data types. Currently we store things as INt64s which could be Int16s: https://github.com/nanoporetech/minknow_api/blob/6f2dfb66bf0ff03edd0a57d758913110f08c7f07/proto/minknow_api/data.proto#L302

Maybe there's an optimization there? But their wording seems "shifty" enough that I don't know that I would bet the bank on it. We could try the optimization, and if the data compression is above some threshold and we feel safe with ONT being consistent, keep it.

@thequicksort
Copy link
Contributor Author

Cool info on optimizing H5py :
https://www.nersc.gov/assets/Uploads/H5py-2017-Feb23.pdf

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant