Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Speed up \uXXXX parsing and improve WTF-8 handling #1175

Merged
merged 5 commits into from
Aug 15, 2024

Conversation

purplesyringa
Copy link
Contributor

Altogether, this speeds up \u-encoded War and Peace parsing by 20%. Performance on json-benchmark is slightly affected: there are some 5% improvements and a -1% regression, but I'm willing to write that off as noise from an imperfect benchmark setup.

This PR should be more readable per-commit w/o whitespace changes. In addition to the above, it includes a variation on #877, since it's easier to implement with this design.

purplesyringa and others added 5 commits August 12, 2024 21:10
This counterintuitively speeds up War and Peace 275 -> 290 MB/s (+5%) by
enabling inlining of encode_utf8 and extend_from_slice.
This speeds up War and Peace 290 MB/s -> 330 MB/s (+15%).
This does not affect performance.
This does not affect performance.
Closes serde-rs#877.

This is a good time to make ByteBuf parsing more consistent as I'm
rewriting it anyway. This commit integrates the changes from serde-rs#877 and
also handles a leading surrogate followed by a surrogate pair correctly.

This does not affect performance significantly.

Co-authored-by: Luca Casonato <hello@lcas.dev>
Comment on lines +908 to +909
// XXX: This is actually a trailing surrogate.
return error(read, ErrorCode::LoneLeadingSurrogateInHexEscape);
Copy link
Contributor Author

@purplesyringa purplesyringa Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I do anything about this? This typo was present before the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would accept a followup PR to change the ErrorCode enum and fix the error message.

@purplesyringa purplesyringa changed the title Speed up \uXXXX parsing and other improvements Speed up \uXXXX parsing and improve WTF-8 handling Aug 12, 2024
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dtolnay dtolnay merged commit 0f942e5 into serde-rs:master Aug 15, 2024
13 checks passed
@purplesyringa purplesyringa deleted the faster-backslash-u branch August 18, 2024 20:55
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants