We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
In InfluxDB we use Dictionary(Int32, Utf8) columns a lot.
Dictionary(Int32, Utf8)
Queries like this (with string constants) work great and are very fast
SELECT ... WHERE column = '1'
Queries like this (note 1 is an integer, not a '1') the query goes very slow
1
'1'
SELECT ... WHERE column = 1
@erratic-pattern and I tracked this down to an issue/ limitation in type coercion:
DataFusion CLI v37.1.0 > create table test as values (arrow_cast('1', 'Dictionary(Int32, Utf8)')); 0 row(s) fetched. Elapsed 0.010 seconds. > select arrow_typeof(column1) from test; +----------------------------+ | arrow_typeof(test.column1) | +----------------------------+ | Dictionary(Int32, Utf8) | +----------------------------+ 1 row(s) fetched. Elapsed 0.002 seconds. > explain SELECT * from test where column1 = 1; +---------------+---------------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------------+ | logical_plan | Filter: CAST(test.column1 AS Utf8) = Utf8("1") | | | TableScan: test projection=[column1] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: CAST(column1@0 AS Utf8) = 1 | | | MemoryExec: partitions=1, partition_sizes=[1] | | | | +---------------+---------------------------------------------------+ 2 row(s) fetched. Elapsed 0.003 seconds.
I think this shows the core problem:
| logical_plan | Filter: CAST(test.column1 AS Utf8) = Utf8("1") |
It basically shows the column is being converted to a string, rather than the constant being converted to th ecorrect type.
Not only does this mean the column is being un-encoded for the comparsion, it also means that PruningPredicate doesn't work either
PruningPredicate
I would like the query to go fast lol
Specifically, I think the filter should look like this (no cast on the column, and instead the constant type matches)
| logical_plan | Filter: test.column1 = Dictionary(Int32, Utf8("1")) |
Note this is what happens if you compare the dictionary column to a string literal:
> explain SELECT * from test where column1 = '1'; +---------------+-----------------------------------------------------+ | plan_type | plan | +---------------+-----------------------------------------------------+ | logical_plan | Filter: test.column1 = Dictionary(Int32, Utf8("1")) | | | TableScan: test projection=[column1] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: column1@0 = 1 | | | MemoryExec: partitions=1, partition_sizes=[1] | | | | +---------------+-----------------------------------------------------+ 2 row(s) fetched. Elapsed 0.002 seconds. >
We could potentially update the coercion logic to coerce 1 to Dictionary(.. "1") or maybe update the unwrap_comparsion logic
Dictionary(..
)
No response
The text was updated successfully, but these errors were encountered:
I have a PR that fixes this. #10221 Here is the explain after making the change:
> explain SELECT * from test where column1 = 1; +---------------+-----------------------------------------------------+ | plan_type | plan | +---------------+-----------------------------------------------------+ | logical_plan | Filter: test.column1 = Dictionary(Int32, Utf8("1")) | | | TableScan: test projection=[column1] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: column1@0 = 1 | | | MemoryExec: partitions=1, partition_sizes=[1] | | | | +---------------+-----------------------------------------------------+ 2 row(s) fetched. Elapsed 0.008 seconds.
However it looks like some tests are failing so I am still looking into it.
Sorry, something went wrong.
#10323 is ready for review and avoids the previously discussed issues with #10221
Thanks @erratic-pattern -- I hope to look at this tomorrow morning
erratic-pattern
Successfully merging a pull request may close this issue.
Is your feature request related to a problem or challenge?
In InfluxDB we use
Dictionary(Int32, Utf8)
columns a lot.Queries like this (with string constants) work great and are very fast
Queries like this (note
1
is an integer, not a'1'
) the query goes very slow@erratic-pattern and I tracked this down to an issue/ limitation in type coercion:
Reproducer
I think this shows the core problem:
It basically shows the column is being converted to a string, rather than the constant being converted to th ecorrect type.
Not only does this mean the column is being un-encoded for the comparsion, it also means that
PruningPredicate
doesn't work eitherDescribe the solution you'd like
I would like the query to go fast lol
Specifically, I think the filter should look like this (no cast on the column, and instead the constant type matches)
Note this is what happens if you compare the dictionary column to a string literal:
Describe alternatives you've considered
We could potentially update the coercion logic to coerce
1
toDictionary(..
"1")
or maybe update the unwrap_comparsion logicAdditional context
No response
The text was updated successfully, but these errors were encountered: