Closed
Description
Describe the bug
When I work on SortMergeJoin, there are some TPCDS query failures happened with errors like:
- q38 *** FAILED ***
java.lang.Exception: Expected "struct<[count(1):bigint]>", but got "struct<[]>" Schema did not match
SELECT count(*)
FROM (
SELECT DISTINCT
c_last_name,
c_first_name,
d_date
FROM store_sales, date_dim, customer
WHERE store_sales.ss_sold_date_sk = date_dim.d_date_sk
AND store_sales.ss_customer_sk = customer.c_customer_sk
AND d_month_seq BETWEEN 1200 AND 1200 + 11
INTERSECT
SELECT DISTINCT
c_last_name,
c_first_name,
d_date
FROM catalog_sales, date_dim, customer
WHERE catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
AND catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
AND d_month_seq BETWEEN 1200 AND 1200 + 11
INTERSECT
SELECT DISTINCT
c_last_name,
c_first_name,
d_date
FROM web_sales, date_dim, customer
WHERE web_sales.ws_sold_date_sk = date_dim.d_date_sk
AND web_sales.ws_bill_customer_sk = customer.c_customer_sk
AND d_month_seq BETWEEN 1200 AND 1200 + 11
) hot_cust
LIMIT 100
...
org.apache.comet.CometNativeException
Arrow error: Invalid argument error: RowConverter column schema mismatch, expected Utf8 got Date32
It is because DataFusion coalesce
function returns a Date32
array from Date32
inputs (this is correct) but its return type is Utf8
. The details are in apache/datafusion#9458. The fix is at apache/datafusion#9459.
Steps to reproduce
test("coalesce should return correct datatype") {
Seq(true, false).foreach { dictionaryEnabled =>
withTempDir { dir =>
val path = new Path(dir.toURI.toString, "test.parquet")
makeParquetFileAllTypes(path, dictionaryEnabled = dictionaryEnabled, 10000)
withParquetTable(path.toString, "tbl") {
checkSparkAnswerAndOperator(
"SELECT coalesce(cast(_18 as date), cast(_19 as date), _20) FROM tbl")
}
}
}
}
Expected behavior
No response
Additional context
No response