Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add simplified C++23 chunk_view #5035

Merged
merged 23 commits into from
Jun 6, 2024
Merged

Add simplified C++23 chunk_view #5035

merged 23 commits into from
Jun 6, 2024

Conversation

lums658
Copy link
Contributor

@lums658 lums658 commented Jun 3, 2024

This PR adds a class to provide a subset of the functionality of the C++23 std::ranges::chunk_view. This view divides an underlying view into a range of equal-sized chunks (the last one may be shorter than the others). A complete description of chunk_view can be found on cppreference.com, which also includes "exposition only" aspects that point to particular implementation approaches. We did not adopt those approaches in most cases, though the public API of the class matches the spec, for the functions that we did implement. This view will be used to partition the original input data from the user query into, well, chunks that will be individually sorted and then written to a TileDB array as a data block. Data blocks are of fixed and uniform byte size that will contain a chunk smaller than or equal to its byte limit.

  • The class assumes a random_access_range, which is what we have in our buffers that we are chunking, so I didn't try to generalize any further than that.

  • The class provides a complete iterator interface via iterator_facade as well as a complete view interface via view_interface. Aspects of those interfaces that rely on C++23 are pended.

  • Unit tests validate that chunk_view and its iterator meet the expected std::ranges concepts.

  • The unit tests do not yet test composition of this view with other views (our custom views as well as other C++20 views). This will be fairly extensive and the subject of its own PR.

  • One todo would be to create a variable chunk class view that chunks up the underlying view into variable-sized pieces, the better to fit into the data blocks. However, the data blocks will be fairly large, so there should not be too much internal fragmentation. We can always manage internal fragmentation by making the chunks some fraction of the block size, so even if there is some variation in chunk size, they can be well-packed into the data blocks.


TYPE: IMPROVEMENT
DESC: Implementation of a chunk_view class to provide a subset of C++23 chunk_view, suitable for supporting external sort.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not reviewed the code itself; absence of comments about it is not endorsement.

What I was looking for is code structure, build/test issues, and naming.

tiledb/common/test/CMakeLists.txt Outdated Show resolved Hide resolved
tiledb/common/util/CMakeLists.txt Outdated Show resolved Hide resolved
tiledb/common/util/chunk_view.h Outdated Show resolved Hide resolved
include(unit_test)

commence(unit_test util)
this_target_sources(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our indent style has been (1) two spaces inside commence/conclude (which CLion apparently can't be configured to do automatically) and (2) four spaces for function arguments (which CLion will do).

tiledb/common/util/test/unit_block_view.cc Show resolved Hide resolved
tiledb/common/util/chunk_view.h Outdated Show resolved Hide resolved
tiledb/common/util/print_types.h Outdated Show resolved Hide resolved
tiledb/common/util/test/unit_permutation_sort.cc Outdated Show resolved Hide resolved
@KiterLuc KiterLuc force-pushed the lums/sc-48621/block-view branch 2 times, most recently from c6fd503 to d6a5f35 Compare June 3, 2024 21:46
Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the new tests on my local Windows machine and got no checked iterator errors.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things I noticed. One is from last time.

tiledb/stdx/__ranges/chunk_view.h Outdated Show resolved Hide resolved
@@ -0,0 +1,30 @@
#
# tiledb/stdx/__ranges/CMakeLists.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory should be just ranges; no underscores

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard library header file is called , which, if we mimic, requires a different name for the subdirectory -- LLVM does it this way (with the underscores for the directory name). Alternatively, we can use "ranges.h" for our own ranges and use ranges for the subdirectory -- but we've mimicked thread and stop_token by having our own files named thread and stop_token, so there is precedent for range, notably in the stdx directory. I think we should be consistent either way (either all without .h or all with .h). GCC solves this by putting everything in the ranges file, which I would prefer not to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but it looks weird, and it's not something we're doing anywhere else in our own code. Thus it needs documentation.

The appropriate place is DIRECTORY.md, which we've used elsewhere.

tiledb/stdx/__ranges/zip_view.h Outdated Show resolved Hide resolved
@KiterLuc KiterLuc dismissed eric-hughes-tiledb’s stale review June 6, 2024 07:01

I think all comments are addressed now. If not let me know and I'll make sure we follow up.

@KiterLuc KiterLuc merged commit 0fafe92 into dev Jun 6, 2024
63 checks passed
@KiterLuc KiterLuc deleted the lums/sc-48621/block-view branch June 6, 2024 07:02
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants