Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve the univariate statsforecast function in EvaDB #1081

Closed
6 tasks done
xzdandy opened this issue Sep 8, 2023 · 5 comments · Fixed by #1094
Closed
6 tasks done

Improve the univariate statsforecast function in EvaDB #1081

xzdandy opened this issue Sep 8, 2023 · 5 comments · Fixed by #1094
Assignees
Labels
Milestone

Comments

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 8, 2023

Search before asking

  • I have searched the EvaDB issues and found no similar feature requests.

Description

  • The univariate statsforecast function train and predicts on the exact same input relation, so there is no need for a separate training procedure. Currently SELECT Forecast(12) FROM AirData; does not make sense.
  • The timeseries column is not properly handled. statsforecast has a required format for the timeseries column. https://nixtla.github.io/statsforecast/docs/getting-started/getting_started_short.html
  • The univariate statsforecast expects a fixed schema for the input dataframe. Renaming the column is not handled properly now.
  • Update documentation with all available parameters.

Use case

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@americast
Copy link
Member

The univariate statsforecast function train and predicts on the exact same input relation, so there is no need for a separate training procedure. Currently SELECT Forecast(12) FROM AirData; does not make sense.

I believe we can simply do SELECT Forecast(12);. The FROM part is a little redundant, but I am not sure if that is in line with SQL syntax.

@xzdandy I'll take care of 2, you can assign it to me. Thanks!

@xzdandy
Copy link
Collaborator Author

xzdandy commented Sep 8, 2023

The univariate statsforecast function train and predicts on the exact same input relation, so there is no need for a separate training procedure. Currently SELECT Forecast(12) FROM AirData; does not make sense.

I believe we can simply do SELECT Forecast(12);. The FROM part is a little redundant, but I am not sure if that is in line with SQL syntax.

@xzdandy I'll take care of 2, you can assign it to me. Thanks!

Thanks @americast! SELECT Forecast(12) the syntax is not supported. We can add that. I will handle 3 first, which does not allow me to doing forecast on tables with customized column names.

For time data type, it can be tricky. For example, I am using House Property Sales Time Series data set, where the saledate column is 30/09/2007, which is not the default panda date type format. We need to support some kind of date type and conversion here. Any idea you have.

@americast
Copy link
Member

I will handle 3 first, which does not allow me to doing forecast on tables with customized column names.

@xzdandy I had added some support for customized column names in #969 . It's handled by the id and time variables. Are they not working for you?

@xzdandy
Copy link
Collaborator Author

xzdandy commented Sep 8, 2023

I will handle 3 first, which does not allow me to doing forecast on tables with customized column names.

@xzdandy I had added some support for customized column names in #969 . It's handled by the id and time variables. Are they not working for you?

It is not working. 1) the change is to the aggregated_batch instead of data. This can be easily fixed. 2) The output object of the UDF is not correctly binded. So in projection, we are looking for a non-existent column.

@xzdandy
Copy link
Collaborator Author

xzdandy commented Sep 9, 2023

From the warning message, /home/zxu330/eva/evadb-venv-test/lib/python3.10/site-packages/statsforecast/core.py:691: UserWarning: Parsing dates in %d/%m/%Y format when dayfir st=False (the default) was specified. Pass dayfirst=True or specify a format to silence this warning. It seems we can specify a time format to parsing. We can explore this option.

xzdandy added a commit that referenced this issue Sep 10, 2023
Addressing item3 in #1081

* [x] In `evadb/executor/create_function_executor.py`, we rename the
input relationship to a [fixed
schema](https://nixtla.github.io/statsforecast/docs/getting-started/getting_started_short.html)
requested by statsforecast
* [x] Rename the output column so it is synced with binder. A temporal
fix. We will reconsider the de# #1017
* [x] Update testcases to test the column rename feature.
jiashenC pushed a commit that referenced this issue Sep 10, 2023
- Addressing ` Update documentation with all available parameters.` in
#1081.
- Adding documentation for 
   * MODEL
   * ID
   * TIME
   * PREDICT
   * FREQUENCY
americast added a commit that referenced this issue Sep 12, 2023
xzdandy added a commit that referenced this issue Sep 12, 2023
Address the `SELECT Forecast(12) FROM AirData;` to `SELECT
Forecast(12);` in #1081

- [x] update parser, binder, optimizer, and executor to allow project
without children.
- [x] update forecasting test cases and documentation.
- [x] add unit test and short integration test for `SELECT expr;`.
- [x] add documentation that we support `SELECT expr;`.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants