-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
refactor: streamline column-aware row encoding impl #20583
base: main
Are you sure you want to change the base?
Conversation
const OFFSET8 = 0b01; | ||
const OFFSET16 = 0b10; | ||
const OFFSET32 = 0b11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OFFSET32
here overlaps with OFFSET8
and OFFSET16
, for which the bitflag doesn't seem to provide clear semantics. That's why I decide to re-implement with bitfield.
offsets: Vec<u8>, | ||
buf: Vec<u8>, | ||
} | ||
|
||
/// A trait unifying [`ToDatumRef`] and already encoded bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unify encode
and encode_slice
.
// Use 0 if there's no data. | ||
let max_offset = usize_offsets.last().copied().unwrap_or(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that we support ser/de for empty rows.
.reserve_exact(usize_offsets.len() * offset_width.width()); | ||
for &offset in usize_offsets { | ||
self.offsets | ||
.put_uint_le(offset as u64, offset_width.width()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unifies put_u8
, put_u16_le
, and put_u32_le
.
/// A view of the encoded bytes, which can be iterated over to get the column id and data. | ||
/// Used for deserialization. | ||
// TODO: can we unify this with `RowEncoding`, which is for serialization? | ||
#[derive(Clone)] | ||
struct EncodedBytes<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unify Deserializer
and try_drop_invalid_columns
.
04f2c7b
to
8acd50f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR refactors the column-aware row encoding implementation to improve clarity, conciseness, and reduce potential errors.
- Replaces the bitflags-based header with a bitfield struct and an enum representing offset width.
- Consolidates encoding logic by introducing a generic Encode trait and refines the serializer/deserializer implementations.
- Updates tests and configuration to match the new encoding scheme.
Reviewed Changes
File | Description |
---|---|
src/common/src/util/value_encoding/column_aware_row_encoding.rs | Refactored row encoding/decoding using a bitfield-based header and generic encoding trait. |
src/storage/src/row_serde/value_serde.rs | Adjusted tests to validate new behavior for handling dropped columns. |
src/common/Cargo.toml | Added the bitfield-struct dependency to support new header implementation. |
src/common/src/util/value_encoding/error.rs | Changed the error message format for invalid flags to use binary representation. |
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
src/common/src/util/value_encoding/error.rs:45
- [nitpick] Using binary formatting for the invalid flag may obscure its numeric value. Consider providing additional context or including a decimal representation for improved clarity.
#[error("Invalid flag: {0:b}")]
5bc558d
to
55c1865
Compare
55c1865
to
4d6f827
Compare
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
4d6f827
to
fe5cdd0
Compare
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Reimplement column-aware row encoding to make the code more clear, concise and less error-prone. See review comments for explanation of changes.
Checklist
Documentation
Release note