-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Document/test unstable decimal dtype #1033
base: main
Are you sure you want to change the base?
Conversation
These act as canaries to help spot changes in the decimal implementation.
test "mean returns decimal instead of float", %{df: df} do | ||
assert_raise( | ||
RuntimeError, | ||
""" | ||
DataFrame mismatch. | ||
|
||
expected: | ||
|
||
names: ["a"] | ||
dtypes: %{"a" => {:f, 64}} | ||
|
||
got: | ||
|
||
names: ["a"] | ||
dtypes: %{"a" => {:decimal, 38, 1}} | ||
""", | ||
fn -> DF.summarise(df, a: mean(a)) end | ||
) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a result of our expectations:
explorer/lib/explorer/backend/lazy_series.ex
Lines 772 to 783 in 020fb23
defp dtype_for_agg_operation(op, _) | |
when op in [:count, :nil_count, :size, :n_distinct, :argmin, :argmax], | |
do: {:u, 32} | |
defp dtype_for_agg_operation(op, series) | |
when op in [:first, :last, :sum, :min, :max, :product], | |
do: series.dtype | |
defp dtype_for_agg_operation(op, _) when op in [:all?, :any?], do: :boolean | |
defp dtype_for_agg_operation(:mode, series), do: {:list, series.dtype} | |
defp dtype_for_agg_operation(_, _), do: {:f, 64} |
I'm not entirely sure it's correct. But it seems to be?
>>> df = pl.DataFrame({"a": [1.1, 2.2]}, schema={"a": pl.datatypes.Decimal(18, 2)})
>>> df.with_columns(b=pl.col("a").mean())
shape: (2, 2)
┌───────────────┬──────┐
│ a ┆ b │
│ --- ┆ --- │
│ decimal[18,2] ┆ f64 │
╞═══════════════╪══════╡
│ 1.10 ┆ 1.65 │
│ 2.20 ┆ 1.65 │
└───────────────┴──────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine to default our decimal operations to float and fix them as Polars start returning decimals for them too. I would prefer that than raising (because returning float still "works").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josevalim Sorry forgot about this. Are you saying we should remove this test and force our decimal operations to return float instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return whatever Polars return for now and we will fix it naturally as Polars fixes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then the test can stay. When Polars fixes their end, we will learn about it because the test will break. Then we can convert it to a test of what the behavior should actually be.
Discussed here: #1018 (comment)
Adds a test module and a warning to the moduledoc: