Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix segfaults in WebP queries ran in parallel. #5065

Merged
merged 6 commits into from
Jun 13, 2024

Conversation

shaunrd0
Copy link
Contributor

@shaunrd0 shaunrd0 commented Jun 10, 2024

The WebpFilter calls WriterTile::set_max_tile_chunk_size to override the maximum tile chunk size in an attempt to avoid further chunking data within each tile, but because WriterTile::max_tile_chunk_size_ is static this doesn't work as intended when we're running multiple WebP queries across threads. It's possible for one thread to override this value while it's being used by another running query and this results in incorrect chunking for WebP data, potentially causing segfaults from libwebp reaching out of memory bounds.

The WebP library was reaching out of bounds because after another thread overrides max_tile_chunk_size_ we don't account for the smaller chunk's dimensions when calling WebP APIs and still pass in the same height, width, and row stride to WebP API calls.

This PR removes the use of WriterTile::set_max_tile_chunk_size and updates FilterPipeline::use_tile_chunking to return false for WebP, which IIUC should have the behavior that was initially intended and fixes segfaults seen on REST.

SC-48697

Limitations: If a WebP array exists that was impacted by this during ingestion and it did not produce a segfault, that array will still be read back using chunking. AFAICT there is no risk of segfault for this case since the decoding APIs in WebP do not depend on the caller to specify buffer dimensions, instead WebP will decode and provides to the caller the width and height of the decoded image. On the next write to the array there will be no error or risk of segfault and the new tiles will not use chunking moving forward for reads or writes.


TYPE: BUG
DESC: Fix segfaults in WebP queries ran in parallel.

@shaunrd0 shaunrd0 requested a review from KiterLuc June 10, 2024 22:04
@@ -644,6 +645,8 @@ bool FilterPipeline::use_tile_chunking(
} else if (version >= 13 && has_filter(FilterType::FILTER_DICTIONARY)) {
return false;
}
} else if (has_filter(FilterType::FILTER_WEBP)) {
Copy link
Member

@ypatia ypatia Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an array is already written using FILTER_WEBP with chunking enabled, will it be possible to read it after disabling it? Or we need to add a check on the version as well to make sure this applies from now on?
If yes, is there a way to fix this problem for older arrays by just adapting the reader, or it's a Writer bug that we are stuck with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep good call - I added a test that writes a tile with chunking enabled, then disables chunking prior to reading it back. It looks like we load tile chunk data from the result tiles though, so if we write a tile using chunking it will always be read using chunking. So turns out the test I added is not very interesting for this case, but it's still a stake in the ground for moving webp unit tests out of tiledb_unit so I left it in this PR.

https://github.com/tiledb-inc/tiledb/blob/f6809293d55406ca20703b54a8cfcc48e316ad68/tiledb/sm/query/readers/reader_base.cc#L753

IIUC this means we are ok if a subsequent write will not use chunking, but @KiterLuc correct me if I missed something. If you have any ideas on how I could add better testing for this specific edge case I can add them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying that! This means the issue cannot be fixed for arrays already created that use WebP filter, right? Let's highlight that limitation in the bug and PR description for future reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@shaunrd0 shaunrd0 force-pushed the smr/sc-48697/fix-webp-segfaults branch from 65728c0 to 20636c0 Compare June 11, 2024 17:25
@ypatia ypatia requested a review from KiterLuc June 12, 2024 07:21
Copy link
Contributor

@KiterLuc KiterLuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small change, everything else looks good.

@@ -43,6 +43,13 @@ using namespace tiledb::common;

namespace tiledb::sm {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be implemented in a test library and not in dev code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I created whitebox_helpers.h to keep it in a header and avoid linking with the tiledb_test_support_lib library.

@shaunrd0 shaunrd0 force-pushed the smr/sc-48697/fix-webp-segfaults branch from d8a3e94 to b881843 Compare June 12, 2024 14:53
Copy link
Member

@ypatia ypatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one small comment on documenting the limitations of the fix. Otherwise LGTM, great work!

@KiterLuc KiterLuc merged commit 47dde51 into dev Jun 13, 2024
62 checks passed
@KiterLuc KiterLuc deleted the smr/sc-48697/fix-webp-segfaults branch June 13, 2024 10:06
KiterLuc pushed a commit that referenced this pull request Jun 14, 2024
The WebpFilter calls `WriterTile::set_max_tile_chunk_size` to override
the maximum tile chunk size in an attempt to avoid further chunking data
within each tile, but because `WriterTile::max_tile_chunk_size_` is
static this doesn't work as intended when we're running multiple WebP
queries across threads. It's possible for one thread to override this
value while it's being used by another running query and this results in
incorrect chunking for WebP data, potentially causing segfaults from
libwebp reaching out of memory bounds.

The WebP library was reaching out of bounds because after another thread
overrides `max_tile_chunk_size_` we don't account for the smaller
chunk's dimensions when calling WebP APIs and still pass in the same
height, width, and row stride to [WebP API
calls](https://github.com/TileDB-Inc/TileDB/blob/dev/tiledb/sm/filter/webp_filter.cc#L144).

This PR removes the use of `WriterTile::set_max_tile_chunk_size` and
updates `FilterPipeline::use_tile_chunking` to return `false` for WebP,
which IIUC should have the behavior that was initially intended and
fixes segfaults seen on REST.


[SC-48697](https://app.shortcut.com/tiledb-inc/story/48697/rest-segfaults-running-webp-multiple-queries)

Limitations: If a WebP array exists that was impacted by this during
ingestion and it did not produce a segfault, that array will still be
read back using chunking. AFAICT there is no risk of segfault for this
case since the decoding APIs in WebP do not depend on the caller to
specify buffer dimensions, instead WebP will decode and provides to the
caller the width and height of the decoded image. On the next write to
the array there will be no error or risk of segfault and the new tiles
will not use chunking moving forward for reads or writes.

---
TYPE: BUG
DESC: Fix segfaults in WebP queries ran in parallel.
KiterLuc added a commit that referenced this pull request Jun 17, 2024
#5065) (#5091)

Backport 47dde51 from #5065.

---
TYPE: BUG
DESC: Fix segfaults in WebP queries ran in parallel.

Co-authored-by: Shaun M Reed <shaunrd0@gmail.com>
KiterLuc pushed a commit that referenced this pull request Jun 19, 2024
The WebpFilter calls `WriterTile::set_max_tile_chunk_size` to override
the maximum tile chunk size in an attempt to avoid further chunking data
within each tile, but because `WriterTile::max_tile_chunk_size_` is
static this doesn't work as intended when we're running multiple WebP
queries across threads. It's possible for one thread to override this
value while it's being used by another running query and this results in
incorrect chunking for WebP data, potentially causing segfaults from
libwebp reaching out of memory bounds.

The WebP library was reaching out of bounds because after another thread
overrides `max_tile_chunk_size_` we don't account for the smaller
chunk's dimensions when calling WebP APIs and still pass in the same
height, width, and row stride to [WebP API
calls](https://github.com/TileDB-Inc/TileDB/blob/dev/tiledb/sm/filter/webp_filter.cc#L144).

This PR removes the use of `WriterTile::set_max_tile_chunk_size` and
updates `FilterPipeline::use_tile_chunking` to return `false` for WebP,
which IIUC should have the behavior that was initially intended and
fixes segfaults seen on REST.


[SC-48697](https://app.shortcut.com/tiledb-inc/story/48697/rest-segfaults-running-webp-multiple-queries)

Limitations: If a WebP array exists that was impacted by this during
ingestion and it did not produce a segfault, that array will still be
read back using chunking. AFAICT there is no risk of segfault for this
case since the decoding APIs in WebP do not depend on the caller to
specify buffer dimensions, instead WebP will decode and provides to the
caller the width and height of the decoded image. On the next write to
the array there will be no error or risk of segfault and the new tiles
will not use chunking moving forward for reads or writes.

---
TYPE: BUG
DESC: Fix segfaults in WebP queries ran in parallel.
KiterLuc added a commit that referenced this pull request Jun 19, 2024
#5065) (#5121)

Backport 47dde51 from #5065.

---
TYPE: BUG
DESC: Fix segfaults in WebP queries ran in parallel.

Co-authored-by: Shaun M Reed <shaunrd0@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants