Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Schema mismatch caused by fold #20424

Open
2 tasks done
coastalwhite opened this issue Dec 23, 2024 · 1 comment
Open
2 tasks done

Schema mismatch caused by fold #20424

coastalwhite opened this issue Dec 23, 2024 · 1 comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@coastalwhite
Copy link
Collaborator

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

print(pl.Series([1]).to_frame().lazy().select(x = ~pl.fold(True, lambda acc, s: acc & s.is_null(), pl.all())).collect_schema())
print(pl.Series([1]).to_frame().lazy().select(x = ~pl.fold(True, lambda acc, s: acc & s.is_null(), pl.all())).collect())

Log output

Schema([('x', Int64)])

shape: (1, 1)
┌──────┐
│ x    │
│ ---  │
│ bool │
╞══════╡
│ true │
└──────┘

Issue description

There is a schema mismatch between collect_schema and collect.

Expected behavior

These are the same

Installed versions

Replace this line with the output of pl.show_versions(). Leave the backticks in place.
@coastalwhite coastalwhite added bug Something isn't working python Related to Python Polars needs triage Awaiting prioritization by a maintainer labels Dec 23, 2024
@coastalwhite
Copy link
Collaborator Author

coastalwhite commented Dec 23, 2024

A similar one that is more difficult to solve:

print(pl.Series([1], dtype=pl.Int32).to_frame().lazy().select(x = pl.fold(pl.Series([2], dtype=pl.Int8), lambda acc, s: acc + s, pl.all())).collect_schema())
# Int8

print(pl.Series([1], dtype=pl.Int32).to_frame().lazy().select(x = pl.fold(pl.Series([2], dtype=pl.Int8), lambda acc, s: acc + s, pl.all())).collect())
# Int32

coastalwhite added a commit to coastalwhite/polars that referenced this issue Dec 23, 2024
This PR fixes pola-rs#17391 by properly adding a `TypeCheckRule` to the
`ConversionOptimizer` that will verify that `filters` have a `Boolean`
datatype.

This already caught pola-rs#20424.

In the future, this can be expanded to type check additional parts of the IR
such as arithmetic operators.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant