-
Notifications
You must be signed in to change notification settings - Fork 13
Beginning of "tinystr" optimization #8
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
This commit adds a compact string representation, but doesn't wire them up. Part of the plan for projectfluent#7
Changes the Locale and parser to use TinyStr for language.
I'm not asking for this to be merged just yet. There are some unused code warnings that will go away. A few questions. I see that the full generality of bcp-47 is not supported, for example language can only be 2-3 alpha characters, so "i-enochian" would be failed, similarly for registered language subtag. Supporting these uncommon values is the reason it's TinyStr8, but I can change this. This implementation uses "fake SIMD". I was able to get that to work well, but @SimonSapin has had success getting LLVM to auto-vectorize in rust-lang/rust#59283 . That code would probably be clearer, and a hair more efficient, but I wasn't able to get auto-vectorization to work for bytewise operations in a u64. If this approach generally looks good, I'll keep going. |
Yes, we're aiming for Unicode Locale Identifier, not Language Tag. Please, consult http://unicode.org/reports/tr35/#BCP_47_Conformance for differences. |
Ah, that's super helpful, thanks. I was using bcp-47 as my normative spec. I'll adjust. |
As of this commit, this PR moves |
@raphlinus - can we move this to |
Sure, I'm not attached. Feel free to take my draft whichever direction you like, and if you need my attention on something, just let me know exactly what you'd like. |
closing in favor of zbraniecki/unic-locale#7 |
This patch implements a "tiny string" datatype, very efficient but limited to bounded lengths, and starts with the integration into the Locale datatype. As of this PR, it's just the language subtag, but I wanted to upload it to get feedback before proceeding further.
Progress towards #7