-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: unicode/utf8: improve EncodeRune performance by requiring a fixed 4 bytes width #68129
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
There is no function I'm confident that changing Why is your new code faster? Does it just have fewer bounds checks? But the old code should also only have one bounds check. Note that you don't need to use unsafe to convert, you are permitted to convert from a slice to a pointer to an array. |
Fixed!
Thank you! Fixed as well. Interestingly, it appears that
Yes, it's possible. I'll try to make more tangible tests and see what would be the result outside of micro-benchmarks.
Yes, both perform a single bound check. My intention was that by having the bound check in an inlined function, then the bound check is hoisted to the caller, where I hoped that the compiler may have already proven that the bounds are preserved, thus essentially removing redundant bounds checks. Adding |
I made another synthetic benchmark, this time writing 1024*1024 runes of different widths (so, 1MiB for ASCII). There is no further data processing, just passing through runes to a slice of bytes.
In the case of 4-bytes runes, which is one of the most notable, there is a ~0.3ms improvement in a 4MiB test data (yes, for slower machines it will be greater, but I don't think it will be so in a significant way). Adding any interesting processing may probably have this improvement lost in noise. To your point, I think it's clear that, while there is a reduction in run time, it becomes negligible compared to any meaningful work that could be performed on such volume of data, and the risk of breaking anyone's code would not be worth it. And not to mention increasing the surface of the API. I will be closing this issue as a consequence, but feel free to reopen if you think there would be anything else that would need to be investigated, I would be happy to help. Thank you for your time and patience! Benchmarking code (for reference)
P.S.: Verified that the bounds check for var runeCases = []struct {
r rune
name string
n int // number of bytes of what it should encode to
}{
{-1, "invalid, negative", 3},
{10, "ASCII", 1},
{209, "2 bytes", 2},
{surrogateMin, "invalid, surrogate", 3},
{65535, "3 bytes", 3},
{maxRune, "4 bytes", 4},
{maxRune + 11, "invalid, too long", 3},
}
func BenchmarkEncodeRuneText(b *testing.B) {
const runesToWrite = 1024 * 1024
for _, c := range runeCases {
buf := make([]byte, runesToWrite*c.n+4)
b.Run(c.name, func(b *testing.B) {
b.Run("implem=stdlib", func(b *testing.B) {
for i := 0; i < b.N; i++ {
dst := buf
for j := 0; j < runesToWrite; j++ {
n := utf8.EncodeRune(dst, c.r)
dst = dst[n:]
}
}
})
b.Run("implem=EncodeRune4", func(b *testing.B) {
for i := 0; i < b.N; i++ {
dst := buf
for j := 0; j < runesToWrite; j++ {
n := EncodeRune4(dst, c.r)
dst = dst[n:]
}
}
})
})
}
} |
Proposal Details
Introduction
The function
EncodeRune
from packageunicode/utf8
requires from the provided byte slice only the exact amount of space needed to write the provided rune. This is great for space efficiency, but an alternative approach requiring a fixed 4 spare bytes, regardless if they will use all of them, enables greater runtime optimization. The followingbenchstat
shows the results of such optimizations:The benchmarks currently only have an ASCII (a valid 1 byte rune) and a Japanese (a valid 3 bytes rune) cases. Here, the benchmarks were extended to test for every possible rune case that needs to be encoded:
InvalidRuneMaxPlusOne
is an erroneous rune overflowingMaxRune
, likely the product of rune arithmetic. This is written asRuneError
, which is a 3 bytes runeInvalidRuneSurrogate
is an erroneous rune which is in the surrogate range, same treatment as aboveInvalidRuneNegative
is an erroneous rune that is a negative value, same treatment as aboveSpanishRune
is a valid 2 bytes runeMaxRune
is a valid 4 bytes runeNote that the cases for erroneous runes are added for completeness, and improving any implementation for those cases need not be a priority for the standard library.
Proposal alternatives
Add a new function to implement the proposed behavior
This is obviously not ideal, and may even generate confusion in users. Having two functions do mostly the same thing but with slightly different requirements may prove counter productive. It also increases the API surface only to provide a performance improvement, which is something that is to be avoided as much as possible, even if that already happened in the past.
Change
EncodeRune
to require a fixed lengthThe doc comment in
EncodeRune
is:Here, "large enough" is never specified to be "the least amount needed to encode the rune", which is what the current implementation does, thus it's permissible to argument that 4 available bytes is also "large enough" because that's large enough to encode any UTF-8 rune. So users shouldn't rely on that, but some may already do. Users rely on internals of implementation details as far as calling
runtime.morestack_noctxt
, so it's all possible.But why would someone aware of
utf8.EncodeRune
not reserveutf8.UTFMax
bytes in the first place? The only reasonable explanation the author can think of is micro managing the tail of a buffer to save, say, between 1 and 3 bytes. As a corollary, it should be noted that if this approach was taken, then all buffers written to exclusively withutf8.EncodeRune
should have a length multiple of 4 bytes. Potentially wasting 3 bytes in the worst case doesn't sound that terrible, though, if we compare to the benefit of the improved performance.In the standard library, for example, all buffers are already (sanely) allocating for
utf8.UTFMax
bytes before callingutf8.EncodeRune
, and so should everyone be doing. If the text to be written is always ASCII, then there is no reason to callutf8.EncodeRune
in the first place, just callcopy
for that.But varying implementations exist, so if the approach of changing the
EncodeRune
function is taken, an extra cautious approach would be first rolling this change with a notice and only enabled with an environment variable, defaulting to use old behavior, in another release this environment variable could default to the new behavior, and finally the variable removed and this be the only possible behavior.Ultimately, if those 1 to 3 bytes at the end of a buffer need to be saved, the following approach can do it:
Appendix: example implementation
Changes to
src/unicode/utf8/utf8.go
The name
EncodeRune4
is definitely not great, probably something likeEncodeRuneFixed
could be used (withFixed
meaning "fixed length"), if distinction is going to be made.Benchmarking
go test -timeout=30m -run=- -count=20 -bench=BenchmarkEncode > results-encode.txt
benchstat -col /implem results-encode.txt
Changes made to
src/unicode/utf8/utf8_test.go
for the benchmarksThe text was updated successfully, but these errors were encountered: