-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Dense reader: improve memory consumption for tile structures. #5046
Conversation
/* ********************************* */ | ||
/* STATIC FUNCTIONS */ | ||
/* ********************************* */ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this just brings the functions that used to be in reader base into the legacy reader. The implementations between this and the dense reader diverged too much and only the code in the new ReaderBase::compute_tile_domains is still common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. I will add more after our meeting tomorrow
This change significantly reduces the memory usage for tile structures (result space tile and tile subarray) by doing three things: - Only create the structures for the tiles that will be processed in an internal loop. - Using a structure to store the tile subarray with a smaller memory footprint. - Adding estimation for the tile structure size when determining the tiles to process for an iteration. For the first point, a new class was added: `IterationTileData` to store the tile structurres for an iteration. It is held in a shared pointer so that the compute task can take tile data to finish processing tiles in a thread safe/efficient manner. A problem that arose here is that the tile subarrays were used to get the number of cells when computing the tiles that it can process in an iteration. Since the tile subarrays are heavy to compute, and should be computed in a parallel manner, this posed a problem. To solve this, a new method was added to Subarray to return the cell count of a particular tile (`Subarray::tile_cell_num``). Since the method doesn't allocate any memory, it can run significantly fast and there are no noticeable performance degradation, even in large queries. With this method, we can compute the tiles to process and then compute the tile subarrays later once we know the number of tiles for an iteration. For the second point, a new class was added: `DenseTileSubarray` to store the tile subarrays. This class takes much less memory than `Subarray` and is also much fater to create in memory. Since this is for dense only, the ranges can be types, which also uses less memory as we don't need to store a size. Finally, the computed ranges for a tile can be used in `TileCellSlabIter` as a const reference, which surely improves performance and reduce small memory allocations. For the third bullet point, `compute_result_space_tiles` now tries to estimate the memory for the tile structures as it can be significant when the tile size is small (ie: 1000 cells). I've also changed the code to take into account the filtered data size when computing memory against the memory budget to improve our accuracy. Note that this PR extracted `ReaderBase::compute_tile_domains`, which is common code for the new dense reader and the legacy reader to compute the result space tiles. The rest of the code was split between the legacy reader class and the new dense reader class as their implementations diverged significantly. Finally, I've made a small change to release the filtered data as soon as unfiltering is done to limit the memory footprint even further. [sc-38387] [sc-48388] --- TYPE: IMPROVEMENT DESC: Dense reader: improve memory consumption for tile structures.
389e905
to
cd8b7aa
Compare
This change significantly reduces the memory usage for tile structures (result space tile and tile subarray) by doing three things:
For the first point, a new class was added:
IterationTileData
to store the tile structurres for an iteration. It is held in a shared pointer so that the compute task can take tile data to finish processing tiles in a thread safe/efficient manner. A problem that arose here is that the tile subarrays were used to get the number of cells when computing the tiles that it can process in an iteration. Since the tile subarrays are heavy to compute, and should be computed in a parallel manner, this posed a problem. To solve this, a new method was added to Subarray to return the cell count of a particular tile (`Subarray::tile_cell_num``). Since the method doesn't allocate any memory, it can run significantly fast and there are no noticeable performance degradation, even in large queries. With this method, we can compute the tiles to process and then compute the tile subarrays later once we know the number of tiles for an iteration.For the second point, a new class was added:
DenseTileSubarray
to store the tile subarrays. This class takes much less memory thanSubarray
and is also much fater to create in memory. Since this is for dense only, the ranges can be types, which also uses less memory as we don't need to store a size. Finally, the computed ranges for a tile can be used inTileCellSlabIter
as a const reference, which surely improves performance and reduce small memory allocations.For the third bullet point,
compute_result_space_tiles
now tries to estimate the memory for the tile structures as it can be significant when the tile size is small (ie: 1000 cells). I've also changed the code to take into account the filtered data size when computing memory against the memory budget to improve our accuracy.Note that this PR extracted
ReaderBase::compute_tile_domains
, which is common code for the new dense reader and the legacy reader to compute the result space tiles. The rest of the code was split between the legacy reader class and the new dense reader class as their implementations diverged significantly.Finally, I've made a small change to release the filtered data as soon as unfiltering is done to limit the memory footprint even further.
[sc-38387]
[sc-48388]
TYPE: IMPROVEMENT
DESC: Dense reader: improve memory consumption for tile structures.