Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improved performance when decoding the entire set of rows with streamable JSON formats #253

Merged
merged 3 commits into from
Mar 29, 2024

Conversation

slvrtrn
Copy link
Contributor

@slvrtrn slvrtrn commented Mar 29, 2024

Summary

  • Improved performance when decoding the entire set of rows with streamable JSON formats (such as JSONEachRow or JSONCompactEachRow) by calling the ResultSet.json() method. Depending on the dataset, it's between 10-15% (like cell_towers) and 40% (with large rows and very long strings) less execution time. NB: The actual streaming performance when consuming the ResultSet.stream() (which was fast) hasn't changed. Only the ResultSet.json() method used a suboptimal stream processing in some instances, and now ResultSet.json() just consumes the same stream transformer provided by the ResultSet.stream() method.

    Before:

    [1277 ms   ][JSONEachRow       ][SELECT * FROM large_strings ORDER BY id ASC LIMIT 100000                        ]
    [668 ms    ][JSONEachRow       ][SELECT * FROM cell_towers ORDER BY (radio, mcc, net, created) ASC LIMIT 500000  ]
    

    After:

    [737 ms    ][JSONEachRow       ][SELECT * FROM large_strings ORDER BY id ASC LIMIT 100000                        ]
    [578 ms    ][JSONEachRow       ][SELECT * FROM cell_towers ORDER BY (radio, mcc, net, created) ASC LIMIT 500000  ]
    
  • Removed the outdated decode function.

  • Updated exported types and doc entries for DataFormat.

  • Fixed weird flakiness in expect-type assertions

@slvrtrn slvrtrn requested a review from mshustov March 29, 2024 03:11
@@ -40,9 +40,9 @@ describe('[Web] SELECT streaming', () => {
it('should consume a text response only once', async () => {
const rs = await client.query({
query: 'SELECT * FROM system.numbers LIMIT 1',
format: 'TabSeparated',
format: 'JSONEachRow',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't call JSON on TabSeparated.

Copy link
Member

@mshustov mshustov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What change gave the performance boost? I see changes in restul_set and stream

@slvrtrn
Copy link
Contributor Author

slvrtrn commented Mar 29, 2024

@mshustov

What change gave the performance boost?

The decode function did a few unneeded passes on a huge string. It is way less efficient than the stream transformer we already had in the ResultSet.

@slvrtrn
Copy link
Contributor Author

slvrtrn commented Mar 29, 2024

This: https://github.com/ClickHouse/clickhouse-js/pull/253/files#diff-46e9eda9880b41c04095d911536817d628992924b2bdd3a96f66529dd3628a5bL101-L104

Unless JIT optimizes it out (it does not), it's at least one extra full pass, worst case two.

@slvrtrn slvrtrn merged commit 392bd82 into 1.0.0 Mar 29, 2024
54 checks passed
@slvrtrn slvrtrn deleted the json-decoding-performance branch March 29, 2024 13:08
slvrtrn added a commit that referenced this pull request Mar 29, 2024
1.0.0

* Add support for URL parameters parsing (#232)
* Infer ResultSet type hints based on DataFormat (#238)
* Add SharedMergeTree Cloud tests, remove Node 16 from the CI, and add Node 21 (#234)
* Add pathname config option, revert read-only switch/default settings (#251)
* Improved performance when decoding the entire set of rows with streamable JSON formats (#253)
* Bump dev dependencies, update internal module resolution (#248)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants