Improved performance when decoding the entire set of rows with streamable JSON formats #253

slvrtrn · 2024-03-29T03:11:25Z

Summary

Improved performance when decoding the entire set of rows with streamable JSON formats (such as JSONEachRow or JSONCompactEachRow) by calling the ResultSet.json() method. Depending on the dataset, it's between 10-15% (like cell_towers) and 40% (with large rows and very long strings) less execution time. NB: The actual streaming performance when consuming the ResultSet.stream() (which was fast) hasn't changed. Only the ResultSet.json() method used a suboptimal stream processing in some instances, and now ResultSet.json() just consumes the same stream transformer provided by the ResultSet.stream() method.

Before:
```
[1277 ms   ][JSONEachRow       ][SELECT * FROM large_strings ORDER BY id ASC LIMIT 100000                        ]
[668 ms    ][JSONEachRow       ][SELECT * FROM cell_towers ORDER BY (radio, mcc, net, created) ASC LIMIT 500000  ]
```
After:
```
[737 ms    ][JSONEachRow       ][SELECT * FROM large_strings ORDER BY id ASC LIMIT 100000                        ]
[578 ms    ][JSONEachRow       ][SELECT * FROM cell_towers ORDER BY (radio, mcc, net, created) ASC LIMIT 500000  ]
```
Removed the outdated decode function.
Updated exported types and doc entries for DataFormat.
Fixed weird flakiness in expect-type assertions

mshustov · 2024-03-29T11:27:42Z

packages/client-web/__tests__/integration/web_select_streaming.test.ts

@@ -40,9 +40,9 @@ describe('[Web] SELECT streaming', () => {
    it('should consume a text response only once', async () => {
      const rs = await client.query({
        query: 'SELECT * FROM system.numbers LIMIT 1',
-        format: 'TabSeparated',
+        format: 'JSONEachRow',


why do we need the change?

can't call JSON on TabSeparated.

mshustov

What change gave the performance boost? I see changes in restul_set and stream

slvrtrn · 2024-03-29T13:01:28Z

@mshustov

What change gave the performance boost?

The decode function did a few unneeded passes on a huge string. It is way less efficient than the stream transformer we already had in the ResultSet.

slvrtrn · 2024-03-29T13:03:06Z

This: https://github.com/ClickHouse/clickhouse-js/pull/253/files#diff-46e9eda9880b41c04095d911536817d628992924b2bdd3a96f66529dd3628a5bL101-L104

Unless JIT optimizes it out (it does not), it's at least one extra full pass, worst case two.

1.0.0 * Add support for URL parameters parsing (#232) * Infer ResultSet type hints based on DataFormat (#238) * Add SharedMergeTree Cloud tests, remove Node 16 from the CI, and add Node 21 (#234) * Add pathname config option, revert read-only switch/default settings (#251) * Improved performance when decoding the entire set of rows with streamable JSON formats (#253) * Bump dev dependencies, update internal module resolution (#248)

slvrtrn added 3 commits March 29, 2024 02:31

Improve ResultSet.json() performance (getting all rows at once)

a77b39b

Update web impl, remove the old decode method, fix expect-type flakiness

33eaadb

Fix Web ResultSet

cb93d9b

slvrtrn requested a review from mshustov March 29, 2024 03:11

mshustov reviewed Mar 29, 2024

View reviewed changes

mshustov approved these changes Mar 29, 2024

View reviewed changes

slvrtrn merged commit 392bd82 into 1.0.0 Mar 29, 2024
54 checks passed

slvrtrn deleted the json-decoding-performance branch March 29, 2024 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved performance when decoding the entire set of rows with streamable JSON formats #253

Improved performance when decoding the entire set of rows with streamable JSON formats #253

slvrtrn commented Mar 29, 2024 •

edited

Loading

mshustov Mar 29, 2024

slvrtrn Mar 29, 2024

mshustov left a comment

slvrtrn commented Mar 29, 2024 •

edited

Loading

slvrtrn commented Mar 29, 2024 •

edited

Loading

Improved performance when decoding the entire set of rows with streamable JSON formats #253

Improved performance when decoding the entire set of rows with streamable JSON formats #253

Conversation

slvrtrn commented Mar 29, 2024 • edited Loading

Summary

mshustov Mar 29, 2024

Choose a reason for hiding this comment

slvrtrn Mar 29, 2024

Choose a reason for hiding this comment

mshustov left a comment

Choose a reason for hiding this comment

slvrtrn commented Mar 29, 2024 • edited Loading

slvrtrn commented Mar 29, 2024 • edited Loading

slvrtrn commented Mar 29, 2024 •

edited

Loading

slvrtrn commented Mar 29, 2024 •

edited

Loading

slvrtrn commented Mar 29, 2024 •

edited

Loading