Reduced memory overhead of preparing LZ4-compressed data for server. #110

Enmk · 2021-10-30T11:45:14Z

Do not compress a whole serialized block, but instead only a reasonable-sized chunk.
This removes some temporary buffers and reduces memory pressure.

Also minor refactoring:

moved all serialization-format code to WireFormat class.
removed CodedOutputStream and CodedInputStream classes.

Memory usage

Test executed against a single block of 5 columns by INSERTing and SELECTing rows back from the server. Both binaries were built in RelWithDebInfo mode.

initial - almost right after the program start
before INSERTing - a moment when the block is prepared in memory, but before actual call to Client::Insert
after INSERTing - right after Client::Insert, NOTE: original Block is still in memory for validation.
after SELECTing - right after client.Select("SELECT * FROM ..."), NOTE: original Block is still in memory for validation.

Original implementation

// 10M rows:
Block of: U64 UInt64, S String, FS FixedString(8), LCS LowCardinality(String), LCFS LowCardinality(FixedString(8)) with 10000000rows
RSS
                     initial    value :    4550656      peak:    4550656
            before INSERTing    value :  476459008      peak:  503341056
             after INSERTing    value :  476602368      peak: 1117761536
             after SELECTing    value :  545083392      peak: 1277018112

// 100M rows:
Block of: U64 UInt64, S String, FS FixedString(8), LCS LowCardinality(String), LCFS LowCardinality(FixedString(8)) with 100000000rows
unknown file: Failure
C++ exception with description "DB::Exception: Unexpected packet Data received from client" thrown in the test body.

Version in this PR

// 10M rows:
Block of: U64 UInt64, S String, FS FixedString(8), LCS LowCardinality(String), LCFS LowCardinality(FixedString(8)) with 10000000rows
RSS
                     initial    value :    5767168      peak:    5767168
            before INSERTing    value :  477839360      peak:  504721408
             after INSERTing    value :  477839360      peak:  504721408
             after SELECTing    value :  546471936      peak: 1278402560

// 100M Rows
Block of: U64 UInt64, S String, FS FixedString(8), LCS LowCardinality(String), LCFS LowCardinality(FixedString(8)) with 100000000rows
RSS
                     initial    value :    5857280      peak:    5857280
            before INSERTing    value : 4714172416      peak: 4850671616
             after INSERTing    value : 4714172416      peak: 4850671616
             after SELECTing    value : 5696409600      peak: 12135694336

Comparison

Since the original version failed to insert 100M rows, we are going to compare 10M rows memory usage.
As you can see, original implementation peaks to 1117761536 bytes upon insertion, 503341056 of whose are related to the original Block residing in memory, that is 1117761536 - 503341056 = 614420480 bytes (~0.57Gib) of memory used only to INSERT and send data to the server.

The modified implementation uses an insignificant amount of extra memory, untraceable with the current approach. Moreover, it is undetectable even upon insertion of 100M rows.

Conclusion

Modified version (presented in this PR) uses O(1) extra memory, vs O(n) for the original version.

traceon · 2021-11-03T12:02:22Z

@Enmk I see some seemingly conflicting changes between this and #109. Let's merge #109 before fully reviewing this one.

Do not compress a whole serialized block, but instead only a reasonable-sized chunk. This removes some temporary buffers and reduces memory pressure. Also minor refactoring: - moved all serialization-format code to WireFormat class. - removed CodedOutputStream and CodedInputStream classes.

traceon

Left inline comments.

ut/stream_ut.cpp

clickhouse/base/coded.cpp

clickhouse/base/coded.h

clickhouse/base/compressed.cpp

clickhouse/base/wire_format.h

clickhouse/base/wire_format.cpp

traceon · 2021-11-12T17:24:04Z

Could you please provide some numbers of improvements (and absence of degradation)? Like "before vs after" perf tests results, etc.

traceon · 2021-11-15T15:13:43Z

clickhouse/base/compressed.cpp

+    if (estimated_compressed_buffer_size <= 0)
+        throw std::runtime_error("Failed to estimate compressed buffer size, LZ4 error: " + std::to_string(estimated_compressed_buffer_size));
+
+    compressed_buffer_.resize(estimated_compressed_buffer_size + HEADER_SIZE + EXTRA_COMPRESS_BUFFER_SIZE);


This is a low hanging fruit for optimization: resize without initialization.

traceon self-assigned this Nov 3, 2021

Enmk force-pushed the memory_optimization_on_send_and_receive_lz4_blocks branch from 009b8af to b148348 Compare November 12, 2021 14:56

traceon requested changes Nov 12, 2021

View reviewed changes

Enmk added 2 commits November 13, 2021 11:57

Fixed issues discovered during PR review

61e8d2a

Reduced memory usage

83c91db

traceon reviewed Nov 15, 2021

View reviewed changes

traceon merged commit b10d71e into ClickHouse:master Nov 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced memory overhead of preparing LZ4-compressed data for server. #110

Reduced memory overhead of preparing LZ4-compressed data for server. #110

Enmk commented Oct 30, 2021 •

edited

Loading

traceon commented Nov 3, 2021

traceon left a comment

traceon commented Nov 12, 2021 •

edited

Loading

traceon Nov 15, 2021

Reduced memory overhead of preparing LZ4-compressed data for server. #110

Reduced memory overhead of preparing LZ4-compressed data for server. #110

Conversation

Enmk commented Oct 30, 2021 • edited Loading

Memory usage

Original implementation

Version in this PR

Comparison

Conclusion

traceon commented Nov 3, 2021

traceon left a comment

Choose a reason for hiding this comment

traceon commented Nov 12, 2021 • edited Loading

traceon Nov 15, 2021

Choose a reason for hiding this comment

Enmk commented Oct 30, 2021 •

edited

Loading

traceon commented Nov 12, 2021 •

edited

Loading