Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Document/test unstable decimal dtype #1033

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

billylanchantin
Copy link
Member

@billylanchantin billylanchantin commented Nov 30, 2024

Discussed here: #1018 (comment)

Adds a test module and a warning to the moduledoc:

Screen Shot 2024-11-30 at 11 29 17 AM

These act as canaries to help spot changes
in the decimal implementation.
Comment on lines +25 to +43
test "mean returns decimal instead of float", %{df: df} do
assert_raise(
RuntimeError,
"""
DataFrame mismatch.

expected:

names: ["a"]
dtypes: %{"a" => {:f, 64}}

got:

names: ["a"]
dtypes: %{"a" => {:decimal, 38, 1}}
""",
fn -> DF.summarise(df, a: mean(a)) end
)
end
Copy link
Member Author

@billylanchantin billylanchantin Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a result of our expectations:

defp dtype_for_agg_operation(op, _)
when op in [:count, :nil_count, :size, :n_distinct, :argmin, :argmax],
do: {:u, 32}
defp dtype_for_agg_operation(op, series)
when op in [:first, :last, :sum, :min, :max, :product],
do: series.dtype
defp dtype_for_agg_operation(op, _) when op in [:all?, :any?], do: :boolean
defp dtype_for_agg_operation(:mode, series), do: {:list, series.dtype}
defp dtype_for_agg_operation(_, _), do: {:f, 64}

I'm not entirely sure it's correct. But it seems to be?

>>> df = pl.DataFrame({"a": [1.1, 2.2]}, schema={"a": pl.datatypes.Decimal(18, 2)})
>>> df.with_columns(b=pl.col("a").mean())
shape: (2, 2)
┌───────────────┬──────┐
│ ab    │
│ ------  │
│ decimal[18,2] ┆ f64  │
╞═══════════════╪══════╡
│ 1.101.65 │
│ 2.201.65 │
└───────────────┴──────┘

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine to default our decimal operations to float and fix them as Polars start returning decimals for them too. I would prefer that than raising (because returning float still "works").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim Sorry forgot about this. Are you saying we should remove this test and force our decimal operations to return float instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return whatever Polars return for now and we will fix it naturally as Polars fixes it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then the test can stay. When Polars fixes their end, we will learn about it because the test will break. Then we can convert it to a test of what the behavior should actually be.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants