Skip to content

SQL query with window function PARTITION BY caused panic in 'tokio-runtime-worker' (SQLancer) #12057

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
2010YOUY01 opened this issue Aug 19, 2024 · 3 comments · Fixed by #12297
Assignees
Labels
bug Something isn't working

Comments

@2010YOUY01
Copy link
Contributor

Describe the bug

The below query caused a panic.

SELECT
  sum(1) OVER (
    PARTITION BY false=false
  )
FROM
  t1
WHERE
  ((false > (v1 = v1)) IS DISTINCT FROM true);

This bug is triggered quite often in the fuzzer after window function is added, I think this bug is related to repartition execution in window functions (which is common execution logic, not function specific), see stack trace below.

To Reproduce

Run datafusion-cli under latest main (commit 574dfeb)

DataFusion CLI v41.0.0
> create table t1(v1 int);
0 row(s) fetched.
Elapsed 0.065 seconds.

> insert into t1 values (42);
+-------+
| count |
+-------+
| 1     |
+-------+
1 row(s) fetched.
Elapsed 0.049 seconds.

> SELECT
  sum(1) OVER (
    PARTITION BY false=false
  )
FROM
  t1
WHERE
  ((false > (v1 = v1)) IS DISTINCT FROM true);

thread 'tokio-runtime-worker' panicked at /Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/datafusion/physical-plan/src/repartition/mod.rs:313:79:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("must either specify a row count or at least one column")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Join Error
caused by
External error: task 29 panicked

stack backtrace:
   0: rust_begin_unwind
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/result.rs:1679:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/result.rs:1102:23
   4: datafusion_physical_plan::repartition::BatchPartitioner::partition_iter::{{closure}}
             at /Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/datafusion/physical-plan/src/repartition/mod.rs:313:33
   5: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:305:13
   6: core::option::Option<T>::map
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/option.rs:1075:29
   7: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/iter/adapters/map.rs:108:26
   8: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/alloc/src/boxed.rs:1997:9
   9: datafusion_physical_plan::repartition::RepartitionExec::pull_from_input::{{closure}}
             at /Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/datafusion/physical-plan/src/repartition/mod.rs:799:24
   ...
   tokio stuff
   ...

Expected behavior

No response

Additional context

Found by SQLancer #11030

@2010YOUY01 2010YOUY01 added the bug Something isn't working label Aug 19, 2024
@thinh2
Copy link
Contributor

thinh2 commented Aug 19, 2024

take

@thinh2
Copy link
Contributor

thinh2 commented Sep 1, 2024

Hi @2010YOUY01 ,

I am stucking with this bug several days without any progression, do you have any recommendation to debug the query execution issue? Now, I am able to reproduce the issue and after turn on RUST_LOG=trace, here is the information related to the issue I got and some of my guess and questions:

  • Query's physical plan:
      CoalesceBatchesExec: target_batch_size=8192
        RepartitionExec: partitioning=Hash([true], 4), input_partitions=4
          RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
            ProjectionExec: expr=[]
              CoalesceBatchesExec: target_batch_size=8192
                FilterExec: (false > (v1@0 = v1@0)) IS DISTINCT FROM true
                  MemoryExec: partitions=1, partition_sizes=[1]
  • Debug log with error:
    [2024-08-31T01:31:23Z DEBUG datafusion_physical_plan::stream] Stopping execution: plan returned error: WindowAggExec: wdw=[sum(Int64(1)) PARTITION BY [Boolean(false) = Boolean(false)] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Ok(Field { name: "sum(Int64(1)) PARTITION BY [Boolean(false) = Boolean(false)] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(NULL)), end_bound: Following(UInt64(NULL)), is_causal: false }]

  • From my understanding, I suspect that the issue is because of the repartition_exec RepartitionExec: partitioning=Hash([true], 4), input_partitions=4. This RepartitionExec: partitioning=Hash receive an empty RecordBatch because of the empty ProjectionExec: expr=[] I guess. And processing this empty RecordBatch lead to panic. Is my guess correct and how can I verify that it is correct? I think it is contradict with your assumption that it related to repartition execution in window functions . May I know what is the physical query plan related to the repartition execution in window functions ?

@2010YOUY01
Copy link
Contributor Author

@thinh2 I think you're correct, my expression was ambiguous, the bug is likely related to RepartitionExec

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants