-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Epic] Native StringView
support for string functions
#11790
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
One thing I have noticed during implementations is that some functions such as For example, in #11920 (comment) from @Kev1n8 it is actually probably a good idea to always generate StringView as output (rather than StringArray) as it could avoid a copy. I am thinking once we get the string functions so they can support StringView as input then we can do a second pass and optimize some functions so they produce StringView as output |
Inspired by @Omega359 's great PR #11941, I have some suggestion on testing Although most implementation is adapted from existing implementation, but the execution takes another path, so I think comprehensive end-to-end tests are still needed. Here are the examples on how to adapt existing test cases for
|
We are making pretty good progress here -- just a few more functions left 🚀 |
I think we can claim this is completed. Follow on work is tracked in |
🎉 |
Is your feature request related to a problem or challenge?
We are working to add complete
StringView
support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.Today, most DataFusion string functions support
DataType::Utf8
andDataType::LargeUtf8
and when called with aStringView
argument DataFusion will cast the argument back toDataType::Utf8
which is expensive.To realize the full speed of
StringView
, we need to ensure that all string functions support theDataType::Utf8View
directly.Describe the solution you'd like
Port all string functions
StringViewArray
#11556starts_with
forUtf8View
#11786ASCII
scalar function to supportUtf8View
#11834BTRIM
scalar function to supportUtf8View
#11835CONCAT
scalar function to supportUtf8View
#11836concat_ws
scalar function to supportUtf8View
#11837CONTAINS
scalar function to supportUtf8View
#11838ENDS_WITH
scalar function to supportUtf8View
#11852INITCAP
scalar function to supportUtf8View
#11853levenshtein
scalar function to supportUtf8View
#11854LOWER
scalar function to supportUtf8View
#11855LTRIM
scalar function to supportUtf8View
#11856LPAD
scalar function to supportUtf8View
#11857OCTET_LENGTH
scalar function to supportUtf8View
#11858SPLIT_PART
scalar function to support Utf8View #11950STRPOS
scalar function to support Utf8View #11951SUBSTR
scalar function to support Utf8View #11952TRANSLATE
scalar function to support Utf8View #11953FIND_IN_SET
scalar function to support Utf8View #11954REPEAT
#11962bit_length
#13195Describe alternatives you've considered
No response
Additional context
See coordination plan with @tshauck and myself here: #11787 (comment)
The text was updated successfully, but these errors were encountered: