-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
LazyFrame crashes on with_columns and rolling when DataFrame succeeds #15588
Comments
You're trying to nest the lazyframe inside of a You can redefine your func as
and then it will work for both eager and lazy just be aware it's double materializing. |
Should something be done to improve the error message here instead of trying to stringify a LazyFrame? The fact it works with DataFrames is a side-effect of iteration unpacking each Series object. The "proper" way to combine them would be |
The error could be better, sure. We'd have to check all the inputs to with_columns (and all the other contexts) to see if any are LazyFrames and then it'd have a specific error. Should we add checks on the eager side so that both of these scenarios error? I'll reopen just in case others feel differently but it seems like it's just bike shedding from here. |
Thanks for the response. I ended up using def func(df):
df_agg = df.rolling(index_column = 'Rank', period = '3i').agg(
pl.sum('Value').alias('RolledValue'))
return pl.concat([df, df_agg], how='align') |
@deanm0000 I'm not sure - it just seemed a little odd to me that this happens. It's as if the eager version works "by accident" - but perhaps I'm mistaken. @SMa2021 It seems there is also pl.sum("Value").rolling(index_column="Rank", period="3i") |
Hi @cmdlineluser, thank you for the suggestion! I also want to use a group_by argument, which seems to not be supported by Expr.rolling(), but I removed it from the example to be more minimal. |
@cmdlineluser It's definitely working by accident for the reason you said. A dataframe has |
Python does not have strict typing and we cannot check for all possible wrong inputs. A DataFrame is an iterable of its Series. So when inputting it into a function that accepts an iterable of A LazyFrame is not an iterable. So it correctly raises a TypeError. The error message can be improved, I will open a PR for this. |
Checks
Reproducible example
Log output
invalid literal value: 'naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)\n\nAGGREGATE\n\t[col("Value").sum().alias("RolledValue")] BY [] FROM\n WITH_COLUMNS:\n [col("Rank").set_sorted()]\n DF ["Rank", "Value"]; PROJECT */2 COLUMNS; SELECTION: "None"'
Issue description
The intention is to add a column of the rolled values to the dataframe. While .agg return a dataframe, the eager DataFrame executed successfully, but the LazyFrame rejected the operation.
Expected behavior
Here is the expected output. The eager mode achieves this output.
Installed versions
The text was updated successfully, but these errors were encountered: