-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Improved performance when decoding the entire set of rows with streamable JSON formats #253
Conversation
@@ -40,9 +40,9 @@ describe('[Web] SELECT streaming', () => { | |||
it('should consume a text response only once', async () => { | |||
const rs = await client.query({ | |||
query: 'SELECT * FROM system.numbers LIMIT 1', | |||
format: 'TabSeparated', | |||
format: 'JSONEachRow', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't call JSON on TabSeparated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What change gave the performance boost? I see changes in restul_set
and stream
The |
Unless JIT optimizes it out (it does not), it's at least one extra full pass, worst case two. |
1.0.0 * Add support for URL parameters parsing (#232) * Infer ResultSet type hints based on DataFormat (#238) * Add SharedMergeTree Cloud tests, remove Node 16 from the CI, and add Node 21 (#234) * Add pathname config option, revert read-only switch/default settings (#251) * Improved performance when decoding the entire set of rows with streamable JSON formats (#253) * Bump dev dependencies, update internal module resolution (#248)
Summary
Improved performance when decoding the entire set of rows with streamable JSON formats (such as
JSONEachRow
orJSONCompactEachRow
) by calling theResultSet.json()
method. Depending on the dataset, it's between 10-15% (like cell_towers) and 40% (with large rows and very long strings) less execution time. NB: The actual streaming performance when consuming theResultSet.stream()
(which was fast) hasn't changed. Only theResultSet.json()
method used a suboptimal stream processing in some instances, and nowResultSet.json()
just consumes the same stream transformer provided by theResultSet.stream()
method.Before:
After:
Removed the outdated
decode
function.Updated exported types and doc entries for DataFormat.
Fixed weird flakiness in expect-type assertions