-
Notifications
You must be signed in to change notification settings - Fork 23
Optimize surrogate decoding. #894
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
Use `char ^ 0xD800 <= 0x3FF` to check if a char code is a lead surrogate. That avoids doing a later `& 0x3FF` to get rid of the top bits. Similar for tail surrogate. This ensures that the `high` function gets values without high bits. Also optimize that function to reduce dependency depth and try to hit `base + (something < small)` expressions that can optimized into a single x64 address computation. Gives a ~7% increase on backwards traversal and 38% increase for forward traversal, based on tool/benchmark.dart compiled with `dart compile exe`.
Package publishing
Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation. |
PR HealthBreaking changes ✔️
Changelog Entry ✔️
Changes to files need to be accounted for in their respective changelogs.
Coverage
|
File | Coverage |
---|---|
pkgs/characters/lib/src/characters_impl.dart | 💚 90 % ⬆️ 0 % |
pkgs/characters/lib/src/grapheme_clusters/breaks.dart | 💚 97 % ⬆️ 0 % |
pkgs/characters/lib/src/grapheme_clusters/table.dart | 💚 100 % |
pkgs/characters/tool/bin/generate_tables.dart | 💔 Not covered |
pkgs/characters/tool/src/string_literal_writer.dart | 💔 Not covered |
This check for test coverage is informational (issues shown here will not fail the PR).
This check can be disabled by tagging the PR with skip-coverage-check
.
API leaks ✔️
The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.
Package | Leaked API symbols |
---|
License Headers ✔️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files |
---|
no missing headers |
All source files should start with a license header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Although I don't have much knowledge of this package, and don't understand its intricacies. But what I understand makes sense to me.
var index = chunkStart + (tail & 255); | ||
return _data.codeUnitAt(index); | ||
var offset = (tail >> 8) + (lead << 2); | ||
tail &= 255; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do the assignment instead of the original chunkStart + (tail & 255)
?
var chunkStart = _start.codeUnitAt(offset >> 8); | ||
var index = chunkStart + (tail & 255); | ||
return _data.codeUnitAt(index); | ||
var offset = (tail >> 8) + (lead << 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this assumes that tail
and lead
don't need to be masked with 0x3ff
. Should this be assert
ed here?
Use
char ^ 0xD800 <= 0x3FF
to check if a char code is a lead surrogate. That avoids doing a later& 0x3FF
to get rid of the top bits. Similar for tail surrogate.This ensures that the
high
function gets values without high bits, which makes it smaller (it tries to get inlined, so a little smaller counts).Also optimize that function to reduce dependency depth and try to hit
base + (something < small)
expressions that can optimized into a single x64 address computation.Gives a ~7% increase on backwards traversal and 30% increase for forward traversal, based on tool/benchmark.dart compiled with
dart compile exe
.Actually a small decrease in performance on web for forward iteration, and a small increase for backwards iteration, and Wasm follows Web in performance here.
(Also found a bug in the generator, which hasn't worked since it was last committed.)
Interestingly, the change makes little-to-no difference on the
benchmark/benchmark.dart
benchmark.(Maybe even makes it a little slower.)