Skip to content

Filter crashes when the input schema contains a map #8262

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Tracked by #11429
helgikrs opened this issue Nov 18, 2023 · 4 comments · Fixed by #8501
Closed
Tracked by #11429

Filter crashes when the input schema contains a map #8262

helgikrs opened this issue Nov 18, 2023 · 4 comments · Fixed by #8501
Labels
bug Something isn't working

Comments

@helgikrs
Copy link

helgikrs commented Nov 18, 2023

Describe the bug

DataFusion v33 panics & returns an error in filter when the input schema contains a map because ScalarValue doesn't support maps.

Just making a simple memory table with a map field and querying it with a simple where clause produces this error when creating a physical plan:

Context("EnforceDistribution", NotImplemented("Can't create a scalar from data_type \"Map(Field { name: \"entries\", data_type: Struct([Field { name: \"key\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"value\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)\""))

Creating an explain plan panics in the Display implementation

thread 'main' panicked at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/alloc/src/string.rs:2460:14:
a Display implementation returned an error unexpectedly: Error
stack backtrace:
   0: rust_begin_unwind
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
   1: core::panicking::panic_fmt
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1652:5
   3: core::result::Result<T,E>::expect
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1034:23
   4: <T as alloc::string::ToString>::to_string
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/alloc/src/string.rs:2459:9
   5: datafusion_physical_plan::display::DisplayableExecutionPlan::to_stringified
             at /home/helgi/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-physical-plan-33.0.0/src/display.rs:206:41
   6: datafusion::physical_planner::DefaultPhysicalPlanner::handle_explain::{{closure}}
             at /home/helgi/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-33.0.0/src/physical_planner.rs:1897:29
   7: <datafusion::physical_planner::DefaultPhysicalPlanner as datafusion::physical_planner::PhysicalPlanner>::create_physical_plan::{{closure}}
             at /home/helgi/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-33.0.0/src/physical_planner.rs:445:64

To Reproduce

Here's a simple repro

use std::sync::Arc;

use datafusion::{
    arrow::datatypes::{DataType, Field, Schema},
    datasource::MemTable,
    prelude::SessionContext,
};

#[tokio::main]
async fn main() {
    let ctx = SessionContext::new();

    let key = Field::new("key", DataType::Int64, false);
    let value = Field::new("value", DataType::Int64, true);
    let map_field = Field::new("entries", DataType::Struct(vec![key, value].into()), false);
    let fields = vec![
        Field::new("a", DataType::Int64, true),
        Field::new("map", DataType::Map(map_field.into(), false), true),
    ];
    let schema = Schema::new(fields);

    let memory_table = MemTable::try_new(schema.into(), vec![vec![]]).unwrap();

    ctx.register_table("tbl", Arc::new(memory_table)).unwrap();

    // This panics in the display implementation, removing the explain causes
    // this to return the not implemented error described above instead.
    ctx.sql("explain select * from tbl where a > 0;")
        .await
        .unwrap()
        .collect()
        .await
        .unwrap();
}

I have a more complicated query within a more complex system that panics
in the equivalence_properties function. I don't have a small repro. I'll work on making a small repro for it and then I'll update the issue.

Expected behavior

This seems to be a regression as this worked in earlier releases (specifically v32).

Additional context

No response

@razeghi71
Copy link
Contributor

If this is still relevant, I would like to take a look at this?

@alamb
Copy link
Contributor

alamb commented Dec 8, 2023

If this is still relevant, I would like to take a look at this?

THanks @razeghi71 -- the first step is probably to see if the reproducer on this ticket still panics

@razeghi71
Copy link
Contributor

razeghi71 commented Dec 11, 2023

This problem has been introduced in c9330bc which changed the default empty value for lower and upper bound from ScalarValue::Null to ScalarValue::try_from(field.data_type()). which is not implemented for map, to fix it we have to either implement it for Map or fallback on Null value. I have done these two in two draft PRs I have above.

In the first one I tried to implement the empty value for map as described in the description of Map which returns a List of key values as a struct called entries. And in the second one I fall back to Null. Would be great to get some feedback to see what you folks think? And what is the good approach here?

@alamb
Copy link
Contributor

alamb commented Dec 12, 2023

This problem has been introduced in c9330bc which changed the default empty value for lower and upper bound from ScalarValue::Null to ScalarValue::try_from(field.data_type()). which is not implemented for map, to fix it we have to either implement it for Map or fallback on Null value. I have done these two in two draft PRs I have above.

I think implementing it for Map is what makes the most sense to me

However, I think this may be a non trivial change as it will require implementing a ScalarValue::Map variant which does not see to exist

https://github.com/apache/arrow-datafusion/blob/7f312c8a1d82b19f03c640e4a5d7d6437b2ee835/datafusion/common/src/scalar.rs#L102

l I think ScalarValue::Map can follow the model of ScalarValue::List (as Maps are just lists of structs, as I undertand):

enum ScalarValue {
  ...
  /// one element array
  Map(ArrayRef)
}

In the first one I tried to implement the empty value for map as described in the description of Map which returns a List of key values as a struct called entries.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
3 participants