Skip to content

CASE WHEN doesn't pick first true branch? Differs from SQLite/Postgres #8475

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
tv42 opened this issue Dec 8, 2023 · 0 comments · Fixed by #8477
Closed

CASE WHEN doesn't pick first true branch? Differs from SQLite/Postgres #8475

tv42 opened this issue Dec 8, 2023 · 0 comments · Fixed by #8477
Labels
bug Something isn't working

Comments

@tv42
Copy link
Contributor

tv42 commented Dec 8, 2023

Describe the bug

I found this in the sqlite sqllogictest repository, random/expr/slt_good_3.slt:

query I rowsort label-676
SELECT ALL - CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59
----
-337914

Just to confirm with sqlite3:

$ sqlite3
SQLite version 3.43.2 2023-10-10 12:14:04
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> SELECT ALL - CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59;
-337914
sqlite>

Datafusion disagrees:

$ datafusion-cli
DataFusion CLI v33.0.0
❯ SELECT ALL - CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (- CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Int64(47) WHEN NULL IS NULL THEN COUNT(*) END) + Int64(93) + (- Int64(-44)) * Int64(91) + CASE Int64(44) WHEN (- Int64(-21)) * Int64(69) - Int64(12) THEN Int64(58) ELSE Int64(-3) END * Int64(23) * Int64(84) * (- Int64(-59)) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -337868                                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.017 seconds.

❯

With SQLite, the expression simplifies to

SELECT - 47 + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59;

and then

SELECT - 47 + 93 + - - 44 * 91 + - 3 * + + 23 * + 84 * - - 59;

which Datafusion agrees is -337914. Hence, Datafusion must have evaluated one of the cases differently!

sqlite> SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
47
❯ SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
+----------------------------------------------------------------------------------------------+
| CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Int64(47) WHEN NULL IS NULL THEN COUNT(*) END |
+----------------------------------------------------------------------------------------------+
| 1                                                                                            |
+----------------------------------------------------------------------------------------------+
1 row in set. Query took 0.015 seconds.

Trying to isolate that further, SQLite thinks NOT - AVG ( - 41 ) IS NULL is 1, Datafusion thinks it's true which is the same with types, and Datafusion handles that CASE in isolation just fine:

sqlite> SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
47
sqlite> select NOT - AVG ( - 41 ) IS NULL;
1
sqlite> select case when NOT - AVG ( - 41 ) IS NULL then 'yes' else 'no' end;
yes
❯ SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
+----------------------------------------------------------------------------------------------+
| CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Int64(47) WHEN NULL IS NULL THEN COUNT(*) END |
+----------------------------------------------------------------------------------------------+
| 1                                                                                            |
+----------------------------------------------------------------------------------------------+
1 row in set. Query took 0.014 seconds.

❯ select NOT - AVG ( - 41 ) IS NULL;
+---------------------------------+
| NOT (- AVG(Int64(-41))) IS NULL |
+---------------------------------+
| true                            |
+---------------------------------+
1 row in set. Query took 0.011 seconds.

❯ select case when NOT - AVG ( - 41 ) IS NULL then 'yes' else 'no' end;
+--------------------------------------------------------------------------------+
| CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Utf8("yes") ELSE Utf8("no") END |
+--------------------------------------------------------------------------------+
| yes                                                                            |
+--------------------------------------------------------------------------------+
1 row in set. Query took 0.014 seconds.

But when it's part of that larger statement something goes wrong.

And this is it:

SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN true THEN 'datafusion' END;
sqlite and postgres
❯ SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN true THEN 'datafusion' END;
+---------------------------------------------------------------------------------------------------------+
| CASE WHEN Boolean(true) THEN Utf8("sqlite and postgres") WHEN Boolean(true) THEN Utf8("datafusion") END |
+---------------------------------------------------------------------------------------------------------+
| datafusion                                                                                              |
+---------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.006 seconds.

To Reproduce

❯ SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN true THEN 'datafusion' END;
+---------------------------------------------------------------------------------------------------------+
| CASE WHEN Boolean(true) THEN Utf8("sqlite and postgres") WHEN Boolean(true) THEN Utf8("datafusion") END |
+---------------------------------------------------------------------------------------------------------+
| datafusion                                                                                              |
+---------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.006 seconds.

Expected behavior

sqlite> SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN true THEN 'datafusion' END;
sqlite and postgres

Additional context

No response

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant