Skip to content

Cast Utf8 -> Utf8View (not the other way around) for binary operators #11881

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Tracked by #11752
alamb opened this issue Aug 7, 2024 · 3 comments
Closed
Tracked by #11752

Cast Utf8 -> Utf8View (not the other way around) for binary operators #11881

alamb opened this issue Aug 7, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 7, 2024

Is your feature request related to a problem or challenge?

In #11796 @dharanad added a rule for binary operators such that if Utf8View is in any side, we coerce to Utf8.

I think it would be better to coerce to Utf8View as that coercsion will often be faster (it is faster to cast Utf8 -> Utf8View than the other way around)

@XiangpengHao notes: #11796 (comment)

Agree, similar to this policy:

fn string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
use arrow::datatypes::DataType::*;
match (lhs_type, rhs_type) {
// If Utf8View is in any side, we coerce to Utf8View.
(Utf8View, Utf8View | Utf8 | LargeUtf8) | (Utf8 | LargeUtf8, Utf8View) => {
Some(Utf8View)
}
// Then, if LargeUtf8 is in any side, we coerce to LargeUtf8.
(LargeUtf8, Utf8 | LargeUtf8) | (Utf8, LargeUtf8) => Some(LargeUtf8),
(Utf8, Utf8) => Some(Utf8),
_ => None,
}
}

Describe the solution you'd like

Cast to Utf8View rather than Utf8 in the aforementioned code

Describe alternatives you've considered

No response

Additional context

No response

@dharanad
Copy link
Contributor

dharanad commented Aug 8, 2024

take

@dharanad
Copy link
Contributor

I think we can close this issue

@alamb
Copy link
Contributor Author

alamb commented Aug 25, 2024

Thank you @dharanad - I just doubled checked an indeed this is fixed now:

> create table foo as values (arrow_cast('foo', 'Utf8View'));
0 row(s) fetched.
Elapsed 0.014 seconds.

> explain select column1 || 'bar' from foo;
+---------------+--------------------------------------------------------------------------+
| plan_type     | plan                                                                     |
+---------------+--------------------------------------------------------------------------+
| logical_plan  | Projection: foo.column1 || Utf8View("bar") AS foo.column1 || Utf8("bar") |
|               |   TableScan: foo projection=[column1]                                    |
| physical_plan | ProjectionExec: expr=[column1@0 || bar as foo.column1 || Utf8("bar")]    |
|               |   MemoryExec: partitions=1, partition_sizes=[1]                          |
|               |                                                                          |
+---------------+--------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.006 seconds.

> explain select 'bar' || column1 from foo;
+---------------+--------------------------------------------------------------------------+
| plan_type     | plan                                                                     |
+---------------+--------------------------------------------------------------------------+
| logical_plan  | Projection: Utf8View("bar") || foo.column1 AS Utf8("bar") || foo.column1 |
|               |   TableScan: foo projection=[column1]                                    |
| physical_plan | ProjectionExec: expr=[bar || column1@0 as Utf8("bar") || foo.column1]    |
|               |   MemoryExec: partitions=1, partition_sizes=[1]                          |
|               |                                                                          |
+---------------+--------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.001 seconds.

>

@alamb alamb closed this as completed Aug 25, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants