Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Refactor DataFrameNormalizer to improve performance #1964

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

G-D-Petrov
Copy link
Collaborator

Reference Issues/PRs

Fixes #1963

What does this implement or fix?

This fix aims to reduce the number of calls to make_block and thus improve the performance of the post processing steps when there are multiple columns of the same type next to each other.

Note: there is not improvement when the columns are of different types

Any other comments?

Before the fix the code from the repro took:

8050444 function calls (7700439 primitive calls) in 4.323 seconds
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002    2.822    2.822 _store.py:1831(_post_process_dataframe)

After the fix it took:

1679935 function calls (1503062 primitive calls) in 2.043 seconds
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.005    0.005    0.594    0.594 _store.py:1831(_post_process_dataframe)

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arcticdb reads can be slow when reading many columns
2 participants