Skip to content

approx_percentile_cont_with_weight result type changed from datafusion v39 to v40 #11874

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Michael-J-Ward opened this issue Aug 7, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Michael-J-Ward
Copy link
Contributor

Describe the bug

In datafusion:v39.0.0, calling approx_percentile_cont_with_weight on an integer column produced an integer result.

In datafusion:v40.0.0, calling approx_percentile_cont_with_weight on an integer column produces a float result.

To Reproduce

v39.0.0 behavior

Setup

git checkout 40.0.0
git log --oneline -n 1
cargo run
6a4a280e3 (HEAD, tag: 39.0.0-rc1, tag: 39.0.0, upstream/branch-39) remove unused file
DataFusion CLI v39.0.0
create table foo(x int);
insert into foo values (1), (2), (3);
select arrow_typeof(approx_percentile_cont_with_weight("x", 0.6, 0.5)), approx_percentile_cont_with_weight("x", 0.6, 0.5) from foo;

Output

+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
| arrow_typeof(APPROX_PERCENTILE_CONT_WITH_WEIGHT(foo.x,Float64(0.6),Float64(0.5))) | APPROX_PERCENTILE_CONT_WITH_WEIGHT(foo.x,Float64(0.6),Float64(0.5)) |
+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
| Int32                                                                             | 3                                                                   |
+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.010 seconds.

v40.0.0 behavior

Setup

git checkout 40.0.0
git log --oneline -n 1
cargo run
4cae81363 (HEAD, tag: 40.0.0-rc1, tag: 40.0.0, upstream/branch-40) manually remove a reverted PR from the breaking change section
DataFusion CLI v40.0.0
create table foo(x int);
insert into foo values (1), (2), (3);
select arrow_typeof(approx_percentile_cont_with_weight("x", 0.6, 0.5)), approx_percentile_cont_with_weight("x", 0.6, 0.5) from foo;

Output

+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
| arrow_typeof(approx_percentile_cont_with_weight(foo.x,Float64(0.6),Float64(0.5))) | approx_percentile_cont_with_weight(foo.x,Float64(0.6),Float64(0.5)) |
+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
| Float64                                                                           | 3.0                                                                 |
+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.011 seconds.

Expected behavior

I would have expected the return type to remain consistent, though I am unsure what datafusion uses as the canonical implementation. SQL Server does look like it should return a float

Additional context

approx_percentile_cont_with_weight was converted to a UDAF between v39 and v40.

#10917

@Michael-J-Ward Michael-J-Ward added the bug Something isn't working label Aug 7, 2024
@jayzhan211
Copy link
Contributor

SQL Server does look like it should #10917

It seems like float is the correct type? If that, I don't think there is any further change required

@Michael-J-Ward
Copy link
Contributor Author

I would have expected the return type to remain consistent EDIT between datafusion v39 and v40

I only filed the issue because the behavior changed between datafusion releases as it was migrated to an UDAF.

If datafusion uses SQL Server's implementation as canonical, then I agree that v40 is correct and no action required.

@alamb
Copy link
Contributor

alamb commented Aug 8, 2024

I agree we can close this issue as "working as designed" -- thanks @Michael-J-Ward and @jayzhan211

@alamb alamb closed this as completed Aug 8, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants