-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fix segfaults in WebP queries ran in parallel. #5065
Conversation
@@ -644,6 +645,8 @@ bool FilterPipeline::use_tile_chunking( | |||
} else if (version >= 13 && has_filter(FilterType::FILTER_DICTIONARY)) { | |||
return false; | |||
} | |||
} else if (has_filter(FilterType::FILTER_WEBP)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an array is already written using FILTER_WEBP
with chunking enabled, will it be possible to read it after disabling it? Or we need to add a check on the version as well to make sure this applies from now on?
If yes, is there a way to fix this problem for older arrays by just adapting the reader, or it's a Writer bug that we are stuck with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep good call - I added a test that writes a tile with chunking enabled, then disables chunking prior to reading it back. It looks like we load tile chunk data from the result tiles though, so if we write a tile using chunking it will always be read using chunking. So turns out the test I added is not very interesting for this case, but it's still a stake in the ground for moving webp unit tests out of tiledb_unit so I left it in this PR.
IIUC this means we are ok if a subsequent write will not use chunking, but @KiterLuc correct me if I missed something. If you have any ideas on how I could add better testing for this specific edge case I can add them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying that! This means the issue cannot be fixed for arrays already created that use WebP filter, right? Let's highlight that limitation in the bug and PR description for future reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
65728c0
to
20636c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small change, everything else looks good.
@@ -43,6 +43,13 @@ using namespace tiledb::common; | |||
|
|||
namespace tiledb::sm { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be implemented in a test library and not in dev code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I created whitebox_helpers.h to keep it in a header and avoid linking with the tiledb_test_support_lib
library.
d8a3e94
to
b881843
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one small comment on documenting the limitations of the fix. Otherwise LGTM, great work!
The WebpFilter calls `WriterTile::set_max_tile_chunk_size` to override the maximum tile chunk size in an attempt to avoid further chunking data within each tile, but because `WriterTile::max_tile_chunk_size_` is static this doesn't work as intended when we're running multiple WebP queries across threads. It's possible for one thread to override this value while it's being used by another running query and this results in incorrect chunking for WebP data, potentially causing segfaults from libwebp reaching out of memory bounds. The WebP library was reaching out of bounds because after another thread overrides `max_tile_chunk_size_` we don't account for the smaller chunk's dimensions when calling WebP APIs and still pass in the same height, width, and row stride to [WebP API calls](https://github.com/TileDB-Inc/TileDB/blob/dev/tiledb/sm/filter/webp_filter.cc#L144). This PR removes the use of `WriterTile::set_max_tile_chunk_size` and updates `FilterPipeline::use_tile_chunking` to return `false` for WebP, which IIUC should have the behavior that was initially intended and fixes segfaults seen on REST. [SC-48697](https://app.shortcut.com/tiledb-inc/story/48697/rest-segfaults-running-webp-multiple-queries) Limitations: If a WebP array exists that was impacted by this during ingestion and it did not produce a segfault, that array will still be read back using chunking. AFAICT there is no risk of segfault for this case since the decoding APIs in WebP do not depend on the caller to specify buffer dimensions, instead WebP will decode and provides to the caller the width and height of the decoded image. On the next write to the array there will be no error or risk of segfault and the new tiles will not use chunking moving forward for reads or writes. --- TYPE: BUG DESC: Fix segfaults in WebP queries ran in parallel.
The WebpFilter calls `WriterTile::set_max_tile_chunk_size` to override the maximum tile chunk size in an attempt to avoid further chunking data within each tile, but because `WriterTile::max_tile_chunk_size_` is static this doesn't work as intended when we're running multiple WebP queries across threads. It's possible for one thread to override this value while it's being used by another running query and this results in incorrect chunking for WebP data, potentially causing segfaults from libwebp reaching out of memory bounds. The WebP library was reaching out of bounds because after another thread overrides `max_tile_chunk_size_` we don't account for the smaller chunk's dimensions when calling WebP APIs and still pass in the same height, width, and row stride to [WebP API calls](https://github.com/TileDB-Inc/TileDB/blob/dev/tiledb/sm/filter/webp_filter.cc#L144). This PR removes the use of `WriterTile::set_max_tile_chunk_size` and updates `FilterPipeline::use_tile_chunking` to return `false` for WebP, which IIUC should have the behavior that was initially intended and fixes segfaults seen on REST. [SC-48697](https://app.shortcut.com/tiledb-inc/story/48697/rest-segfaults-running-webp-multiple-queries) Limitations: If a WebP array exists that was impacted by this during ingestion and it did not produce a segfault, that array will still be read back using chunking. AFAICT there is no risk of segfault for this case since the decoding APIs in WebP do not depend on the caller to specify buffer dimensions, instead WebP will decode and provides to the caller the width and height of the decoded image. On the next write to the array there will be no error or risk of segfault and the new tiles will not use chunking moving forward for reads or writes. --- TYPE: BUG DESC: Fix segfaults in WebP queries ran in parallel.
The WebpFilter calls
WriterTile::set_max_tile_chunk_size
to override the maximum tile chunk size in an attempt to avoid further chunking data within each tile, but becauseWriterTile::max_tile_chunk_size_
is static this doesn't work as intended when we're running multiple WebP queries across threads. It's possible for one thread to override this value while it's being used by another running query and this results in incorrect chunking for WebP data, potentially causing segfaults from libwebp reaching out of memory bounds.The WebP library was reaching out of bounds because after another thread overrides
max_tile_chunk_size_
we don't account for the smaller chunk's dimensions when calling WebP APIs and still pass in the same height, width, and row stride to WebP API calls.This PR removes the use of
WriterTile::set_max_tile_chunk_size
and updatesFilterPipeline::use_tile_chunking
to returnfalse
for WebP, which IIUC should have the behavior that was initially intended and fixes segfaults seen on REST.SC-48697
Limitations: If a WebP array exists that was impacted by this during ingestion and it did not produce a segfault, that array will still be read back using chunking. AFAICT there is no risk of segfault for this case since the decoding APIs in WebP do not depend on the caller to specify buffer dimensions, instead WebP will decode and provides to the caller the width and height of the decoded image. On the next write to the array there will be no error or risk of segfault and the new tiles will not use chunking moving forward for reads or writes.
TYPE: BUG
DESC: Fix segfaults in WebP queries ran in parallel.