Skip to content

throw Divide by zero error when use case when #8814

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
liukun4515 opened this issue Jan 10, 2024 · 9 comments · Fixed by #8833
Closed

throw Divide by zero error when use case when #8814

liukun4515 opened this issue Jan 10, 2024 · 9 comments · Fixed by #8833
Labels
bug Something isn't working

Comments

@liukun4515
Copy link
Contributor

Describe the bug

When i use the

case value1 > 0 then value/value1 else 0 end as result

and got the divide by zero

To Reproduce

create table users as values (1,1),(2,2),(3,3),(0,0),(4,0);

When run

select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;

there is no issue.

But when run

select 
case when B.column1 > 0 then A.column1/B.column1 else 0 end as value1,
case when B.column1 > 0 and B.column2 > 0 then A.column1/B.column1 else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;

there is the issue: divide by zero

Expected behavior

no exception and get the result

Additional context

I am try to find the root cause, maybe it is caused by optimizer or the execution of the physical expr.

@liukun4515 liukun4515 added the bug Something isn't working label Jan 10, 2024
@liukun4515
Copy link
Contributor Author

get the plan

❯ explain select
case when B.column1 > 0 then A.column1/B.column1 else 0 end as value1,
case when B.column1 > 0 and B.column2 > 0 then A.column1/B.column1 else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                      |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: CASE WHEN b.column1 > Int64(0)Int64(0)b.column1 AS b.column1 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column1 > Int64(0)Int64(0)b.column1 AS b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3 |
|               |   Projection: a.column1 / b.column1 AS a.column1 / b.column1b.column1a.column1, b.column1 > Int64(0) AS b.column1 > Int64(0)Int64(0)b.column1, b.column2                                                                                                                                                                                                                                  |
|               |     CrossJoin:                                                                                                                                                                                                                                                                                                                                                                            |
|               |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                                                                    |
|               |         TableScan: users projection=[column1]                                                                                                                                                                                                                                                                                                                                             |
|               |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                                                                    |
|               |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                                                                  |
|               |           TableScan: users projection=[column1, column2]                                                                                                                                                                                                                                                                                                                                  |
| physical_plan | ProjectionExec: expr=[CASE WHEN b.column1 > Int64(0)Int64(0)b.column1@1 THEN a.column1 / b.column1b.column1a.column1@0 ELSE 0 END as value1, CASE WHEN b.column1 > Int64(0)Int64(0)b.column1@1 AND column2@2 > 0 THEN a.column1 / b.column1b.column1a.column1@0 ELSE 0 END as value3]                                                                                                     |
|               |   ProjectionExec: expr=[column1@0 / column1@1 as a.column1 / b.column1b.column1a.column1, column1@1 > 0 as b.column1 > Int64(0)Int64(0)b.column1, column2@2 as column2]                                                                                                                                                                                                                   |
|               |     CrossJoinExec                                                                                                                                                                                                                                                                                                                                                                         |
|               |       MemoryExec: partitions=1, partition_sizes=[1]                                                                                                                                                                                                                                                                                                                                       |
|               |       RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1                                                                                                                                                                                                                                                                                                               |
|               |         CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                       |
|               |           FilterExec: column1@0 = 0                                                                                                                                                                                                                                                                                                                                                       |
|               |             MemoryExec: partitions=1, partition_sizes=[1]                                                                                                                                                                                                                                                                                                                                 |
|               |                                                                                                                                                                                                                                                                                                                                                                                           |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

@haohuaijin
Copy link
Contributor

I use explain verbose find that common_sub_expression_eliminate rule extract a.column1 / b.column1 as a common sub expr , result in divide by zero.

| initial_logical_plan                                       | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                                                                                                        |
|                                                            |   CrossJoin:                                                                                                                                                                                                                                                                                                                                    |
|                                                            |     SubqueryAlias: a                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         TableScan: users                                                                                                                                                                                                                                                                                                                        |
|                                                            |     SubqueryAlias: b                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                        |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| logical_plan after common_sub_expression_eliminate         | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                  |
|                                                            |   Projection: a.column1 / b.column1 AS a.column1 / b.column1b.column1a.column1, a.column1, a.column2, b.column1, b.column2                                                                                                                                                                                                                      |
|                                                            |     CrossJoin:                                                                                                                                                                                                                                                                                                                                  |
|                                                            |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |
|                                                            |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                      |
|                                                            |             TableScan: users 

@dhamotharan-ps
Copy link

dhamotharan-ps commented Jan 10, 2024

I have tried to reproduce the problem with latest code datafusion 34.0.0. But it is working fine. What is the version of your datafusion?

DataFusion CLI v34.0.0
❯ create table users as values (1,1),(2,2),(3,3),(0,0),(4,0);
0 rows in set. Query took 0.016 seconds.

❯ select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+--------+
| value3 |
+--------+
| 0      |
| 0      |
| 0      |
| 0      |
| 0      |
+--------+
5 rows in set. Query took 0.040 seconds.
❯ explain select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                               |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: CASE WHEN b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1 - a.column2 / b.column2 ELSE Int64(0) END AS value3 |
|               |   CrossJoin:                                                                                                                                       |
|               |     SubqueryAlias: a                                                                                                                               |
|               |       TableScan: users projection=[column1, column2]                                                                                               |
|               |     SubqueryAlias: b                                                                                                                               |
|               |       Filter: users.column1 = Int64(0)                                                                                                             |
|               |         TableScan: users projection=[column1, column2]                                                                                             |
| physical_plan | ProjectionExec: expr=[CASE WHEN column1@2 > 0 AND column2@3 > 0 THEN column1@0 / column1@2 - column2@1 / column2@3 ELSE 0 END as value3]           |
|               |   CrossJoinExec                                                                                                                                    |
|               |     MemoryExec: partitions=1, partition_sizes=[1]                                                                                                  |
|               |     RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1                                                                          |
|               |       CoalesceBatchesExec: target_batch_size=8192                                                                                                  |
|               |         FilterExec: column1@0 = 0                                                                                                                  |
|               |           MemoryExec: partitions=1, partition_sizes=[1]                                                                                            |
|               |                                                                                                                                                    |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.029 seconds.

@liukun4515
Copy link
Contributor Author

I have tried to reproduce the problem with latest code datafusion 34.0.0. But it is working fine. What is the version of your datafusion?

DataFusion CLI v34.0.0
❯ create table users as values (1,1),(2,2),(3,3),(0,0),(4,0);
0 rows in set. Query took 0.016 seconds.

❯ select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+--------+
| value3 |
+--------+
| 0      |
| 0      |
| 0      |
| 0      |
| 0      |
+--------+
5 rows in set. Query took 0.040 seconds.
❯ explain select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                               |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: CASE WHEN b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1 - a.column2 / b.column2 ELSE Int64(0) END AS value3 |
|               |   CrossJoin:                                                                                                                                       |
|               |     SubqueryAlias: a                                                                                                                               |
|               |       TableScan: users projection=[column1, column2]                                                                                               |
|               |     SubqueryAlias: b                                                                                                                               |
|               |       Filter: users.column1 = Int64(0)                                                                                                             |
|               |         TableScan: users projection=[column1, column2]                                                                                             |
| physical_plan | ProjectionExec: expr=[CASE WHEN column1@2 > 0 AND column2@3 > 0 THEN column1@0 / column1@2 - column2@1 / column2@3 ELSE 0 END as value3]           |
|               |   CrossJoinExec                                                                                                                                    |
|               |     MemoryExec: partitions=1, partition_sizes=[1]                                                                                                  |
|               |     RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1                                                                          |
|               |       CoalesceBatchesExec: target_batch_size=8192                                                                                                  |
|               |         FilterExec: column1@0 = 0                                                                                                                  |
|               |           MemoryExec: partitions=1, partition_sizes=[1]                                                                                            |
|               |                                                                                                                                                    |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.029 seconds.

Can I get your commit id?

I use

commit f8d860365ef17d05cb033c1f39928cac6620e2f2 (HEAD -> upstream_main, upstream/main)
Author: Randall Spears <randall@rspears.me>
Date:   Tue Jan 9 13:35:38 2024 -0700

@liukun4515
Copy link
Contributor Author

I use explain verbose find that common_sub_expression_eliminate rule extract a.column1 / b.column1 as a common sub expr , result in divide by zero.

| initial_logical_plan                                       | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                                                                                                        |
|                                                            |   CrossJoin:                                                                                                                                                                                                                                                                                                                                    |
|                                                            |     SubqueryAlias: a                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         TableScan: users                                                                                                                                                                                                                                                                                                                        |
|                                                            |     SubqueryAlias: b                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                        |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| logical_plan after common_sub_expression_eliminate         | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                  |
|                                                            |   Projection: a.column1 / b.column1 AS a.column1 / b.column1b.column1a.column1, a.column1, a.column2, b.column1, b.column2                                                                                                                                                                                                                      |
|                                                            |     CrossJoin:                                                                                                                                                                                                                                                                                                                                  |
|                                                            |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |
|                                                            |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                      |
|                                                            |             TableScan: users 

I also find this reason, do you have the time to fix this issue?

I think this issue may be cause to other issues.

I want to find which commit caused this issue.

@haohuaijin

@haohuaijin
Copy link
Contributor

haohuaijin commented Jan 11, 2024

@liukun4515

I also find this reason, do you have the time to fix this

yes, I would like to fix this.

I think this is related to #8340 and #8454(and the discussion in #8296), that before above pr the projection pushdown undo the common_subexpr_eliminate , and after #8340 and #8454, the projection pushdown don't undo this. So result in the common sub expr a.column1 / b.column1 evaluate.

seem like we should not extract common sub_expr in CASE WHEN THEN(which some expr never run)? what do you think?

the below is the plan in datafusion-cli v33.0.0,

| logical_plan after common_sub_expression_eliminate         | Projection: CASE WHEN b.column1 > Int64(0)Int64(0)b.column1 AS b.column1 > Int64(0) AS b.column1 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column1 > Int64(0)Int64(0)b.column1 AS b.column1 > Int64(0) AS b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                                                   |
|                                                            |   Projection: a.column1 / b.column1 AS a.column1 / b.column1b.column1a.column1, b.column1 > Int64(0) AS b.column1 > Int64(0)Int64(0)b.column1, a.column1, b.column1, b.column2                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                            |     CrossJoin:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                            |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                                            |         TableScan: users projection=[column1]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|                                                            |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                                            |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                                                            |           TableScan: users projection=[column1, column2]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| logical_plan after push_down_projection                    | Projection: CASE WHEN b.column1 > Int64(0) AS b.column1 > Int64(0) AS b.column1 > Int64(0) THEN a.column1 / b.column1 AS a.column1 / b.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column1 > Int64(0) AS b.column1 > Int64(0) AS b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1 AS a.column1 / b.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                                                                                                                         |
|                                                            |   CrossJoin:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                                                            |     Projection: a.column1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|                                                            |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                                            |         Projection: users.column1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|                                                            |           TableScan: users projection=[column1]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|                                                            |     Projection: b.column1, b.column2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|                                                            |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                                            |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                                                            |           Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                                                            |             TableScan: users projection=[column1, column2]  

@dhamotharan-ps
Copy link

@liukun4515 my codebase is upto below commit.

dhamo@prannav-pc:~/repos/arrow-datafusion$ git log
commit 0e53c6d816f3a9d3d27c6ebb6d25b1699e5553e7 (HEAD -> main, origin/main, origin/HEAD)
Author: comphead <comphead@users.noreply.github.com>
Date:   Mon Jan 8 15:15:55 2024 -0800

    Move tests from `expr.rs` to sqllogictests. Part1 (#8773)
    
    * move tests from  to sqllogictests part1
    
    * Update datafusion/sqllogictest/test_files/expr.slt
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    
    * Update datafusion/sqllogictest/test_files/expr.slt
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    
    ---------
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

@dhamotharan-ps
Copy link

@liukun4515 When I update my codebase to latest, I am also getting 'Divide by zero' error. So this is the regression which is caused by changes after

commit 0e53c6d816f3a9d3d27c6ebb6d25b1699e5553e7 (HEAD -> main, origin/main, origin/HEAD)
Author: comphead <comphead@users.noreply.github.com>
Date:   Mon Jan 8 15:15:55 2024 -0800

@liukun4515
Copy link
Contributor Author

thanks for @dhamotharan-ps @haohuaijin I will help to review your pr this week.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants