Document/test unstable decimal dtype #1033

billylanchantin · 2024-11-30T16:11:14Z

Adds a test module and a warning to the moduledoc:

These act as canaries to help spot changes in the decimal implementation.

billylanchantin · 2024-11-30T16:17:09Z

test/explorer/polars_backend/decimal_unstable_test.exs

+  test "mean returns decimal instead of float", %{df: df} do
+    assert_raise(
+      RuntimeError,
+      """
+      DataFrame mismatch.
+
+      expected:
+
+          names: ["a"]
+          dtypes: %{"a" => {:f, 64}}
+
+      got:
+
+          names: ["a"]
+          dtypes: %{"a" => {:decimal, 38, 1}}
+      """,
+      fn -> DF.summarise(df, a: mean(a)) end
+    )
+  end


This is actually a result of our expectations:

explorer/lib/explorer/backend/lazy_series.ex

Lines 772 to 783 in 020fb23

defp dtype_for_agg_operation(op, _)

when op in [:count, :nil_count, :size, :n_distinct, :argmin, :argmax],

do: {:u, 32}

defp dtype_for_agg_operation(op, series)

when op in [:first, :last, :sum, :min, :max, :product],

do: series.dtype

defp dtype_for_agg_operation(op, _) when op in [:all?, :any?], do: :boolean

defp dtype_for_agg_operation(:mode, series), do: {:list, series.dtype}

defp dtype_for_agg_operation(_, _), do: {:f, 64}

I'm not entirely sure it's correct. But it seems to be?

>>> df = pl.DataFrame({"a": [1.1, 2.2]}, schema={"a": pl.datatypes.Decimal(18, 2)}) >>> df.with_columns(b=pl.col("a").mean()) shape: (2, 2) ┌───────────────┬──────┐ │ a ┆ b │ │ --- ┆ --- │ │ decimal[18,2] ┆ f64 │ ╞═══════════════╪══════╡ │ 1.10 ┆ 1.65 │ │ 2.20 ┆ 1.65 │ └───────────────┴──────┘

It is fine to default our decimal operations to float and fix them as Polars start returning decimals for them too. I would prefer that than raising (because returning float still "works").

@josevalim Sorry forgot about this. Are you saying we should remove this test and force our decimal operations to return float instead?

I think we should return whatever Polars return for now and we will fix it naturally as Polars fixes it.

Ok, then the test can stay. When Polars fixes their end, we will learn about it because the test will break. Then we can convert it to a test of what the behavior should actually be.

billylanchantin added 2 commits November 30, 2024 10:59

Add unstable tests

4224ea0

These act as canaries to help spot changes in the decimal implementation.

Add unstable warning in typedoc

0ee1ba8

billylanchantin commented Nov 30, 2024

View reviewed changes

Move warning to moduledoc

e81129d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document/test unstable decimal dtype #1033

Document/test unstable decimal dtype #1033

billylanchantin commented Nov 30, 2024 •

edited

Loading

billylanchantin Nov 30, 2024 •

edited

Loading

josevalim Nov 30, 2024

billylanchantin Dec 16, 2024

josevalim Dec 16, 2024

billylanchantin Dec 16, 2024

	defp dtype_for_agg_operation(op, _)
	when op in [:count, :nil_count, :size, :n_distinct, :argmin, :argmax],
	do: {:u, 32}

	defp dtype_for_agg_operation(op, series)
	when op in [:first, :last, :sum, :min, :max, :product],
	do: series.dtype

	defp dtype_for_agg_operation(op, _) when op in [:all?, :any?], do: :boolean
	defp dtype_for_agg_operation(:mode, series), do: {:list, series.dtype}

	defp dtype_for_agg_operation(_, _), do: {:f, 64}

Document/test unstable decimal dtype #1033

Are you sure you want to change the base?

Document/test unstable decimal dtype #1033

Conversation

billylanchantin commented Nov 30, 2024 • edited Loading

billylanchantin Nov 30, 2024 • edited Loading

Choose a reason for hiding this comment

josevalim Nov 30, 2024

Choose a reason for hiding this comment

billylanchantin Dec 16, 2024

Choose a reason for hiding this comment

josevalim Dec 16, 2024

Choose a reason for hiding this comment

billylanchantin Dec 16, 2024

Choose a reason for hiding this comment

billylanchantin commented Nov 30, 2024 •

edited

Loading

billylanchantin Nov 30, 2024 •

edited

Loading