-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Implement Utf8View for lpad scalar function #11941
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive @Omega359 👏 I especially like the testing improvements of this PR
I mentioned some places this function could be made faster, but that can totally be done as a follow on (maybe waiting for someone for whom the performance of lpad
is important). I can file a follow on ticket
s.push_str(string); | ||
Ok(Some(s)) | ||
} | ||
let mut s: String = " ".repeat(length - graphemes.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR just follows the existing pattern so I don't think any changes are needed, but I think using a GenericStringBuilder
would be much faster here (as it wouldn't allocate a string for each output array element)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a test locally with using GenericStringBuilder
vs what is in this PR and the differences are within +/- 10% for 1024 and 2048 element arrays for each of the 3 string types. Worth noting though is that timing between runs on my laptop is not really stable at all (+/- 10% or so between runs of the exact same code).
Interestingly the utf8view seems to be consistently slightly slower than just plain utf8 at least in this function.
lpad function/utf8 type/2048
time: [715.78 µs 718.43 µs 721.89 µs]
lpad function/largeutf8 type/2048
time: [745.44 µs 759.62 µs 779.84 µs]
lpad function/stringview type/2048
time: [734.25 µs 751.13 µs 773.67 µs]
let length = if length < 0 { 0 } else { length as usize }; | ||
if length == 0 { | ||
Ok(Some("".to_string())) | ||
if length < graphemes.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here I think GenericStringBuilder would likely be faster
I was looking at @Lordworms implementation in #11942 and I think it would make sense to align the two implementations. In some aspects I like his approach much more than the one I took in this PR. |
Do you think we should merge this PR and work on improvements in a follow on PR? |
I have no objections to that. |
Thanks @Omega359 -- let's merge this to get the initial implementation and tests in and then we can work on improvements as a follow on PR. Let me know if you would like me to file a ticket for
|
I've got the work done locally in a branch along with the GenericStringBuilder work and the benchmark to go with it. I'll create a ticket tomorrow for it thanks @alamb |
Which issue does this PR close?
Closes #11857
Rationale for this change
Support utf8view for lpad udf.
What changes are included in this PR?
Code, tests.
Are these changes tested?
Yes.
Are there any user-facing changes?
No