-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Epic] Complete Initial StringView
in DataFusion
#11752
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
An update here is that @XiangpengHao has a PR with various changes in #11862 We still need to check that PR and figure out what else is in that PR is needed to be enabled "for real" (with tests, etc) |
My ideal resolution here is that we end up in the state where the only change we need to enable string view by default is switch the config setting. I will do some more ticket triage later today to outline other items I know of |
Do we have tickets for regexp binary operators? (like I noticed stringview is not supported on them yet and they have separate implementation than regexp functions Details
|
Not that I know of -- it would be great to add them |
Filed #12180 |
I am going to try and polish up PR to enable string view by default PR (with the arrow upgrade and various recent improvements) and see how close we are #12092 |
StringView by default is finally merged into DataFusion: #13101 so I am claiming success and completion of this issue |
Is your feature request related to a problem or challenge?
This ticket is a follow on to #10918 where we implemented enough initial support for
StringView
/BinaryView
that we can show some pretty sweet ClickBench resultsDescribe the solution you'd like
This epic tracks remaining work to complete the "initial" work which I would like to define as "enable using StringView when reading Strings from Parquet by default"
I am sure there will be additional work / support to add StringView to various other features of DataFusion that we can maybe track with another follow on ticket
Required for enabling StringView by default:
schema_force_string_view
) by default #11682||
forStringViewArray
#11766unreachable code: Utf8/Binary should use ArrowBytesSet
#11767ScalarValue::Utf8View
andScalarValue::BinaryView
#12117ScalarValue::Utf8View
andScalarValue::BinaryView
#12118Utf8View
/BinaryView
-->Utf8
/Binary
at output #12119Utf8
asUtf8View
#12123~
,!~
, etc #12180LIKE
#12500LIKE
slows down some ClickBench queries #12509Could work around but really should be fixed upstream
BinaryView
-->Utf8
andLargeUtf8
arrow-rs#6162StringView
andBinaryView
statistics inStatisticsConverter
arrow-rs#6164StringViewArray::slice()
andBinaryViewArray::slice()
faster / non allocating arrow-rs#6408Additional "Nice to have" Features
StringView
support for string functions #11790CoalesceBatchesExec
for StringViews #11628StringView
in DataFusion #11752The text was updated successfully, but these errors were encountered: