Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve the 16-homesale-forecasting.ipynb to give details on how to tune every configuration. #1278

Open
wants to merge 4 commits into
base: staging
Choose a base branch
from

Conversation

xzdandy
Copy link
Collaborator

@xzdandy xzdandy commented Oct 12, 2023

Update the notebook with neuralforecast and prediction for every postcode after the math domain error get fixed in #1283

@xzdandy xzdandy linked an issue Oct 12, 2023 that may be closed by this pull request
2 tasks
@xzdandy xzdandy self-assigned this Oct 12, 2023
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@xzdandy xzdandy added the AI Engines Features, Bugs, related to AI Engines label Oct 12, 2023
@americast americast self-requested a review October 18, 2023 02:58
@xzdandy xzdandy marked this pull request as ready for review October 23, 2023 05:31
@xzdandy xzdandy added this to the v0.3.9 milestone Oct 23, 2023
@xzdandy
Copy link
Collaborator Author

xzdandy commented Oct 23, 2023

Hi @americast, please review the updated notebook. Below are several issues I am still facing now:

  1. It is not clear how to choose the frequency.
  2. It is not clear how to decide which model / parameters are better. Do we have any measurable / quantitative metrics we can offer after the training.
  3. Is the NeuralForecast training time and accuracy tunable? 28 minutes with neuralforecast vs 21 seconds with statsforecast is a huge gap.
  4. Even though, we fix the math domain error when there is only one data point, there are many 0 outputs, which does not make sense. I am using WHERE price > 0 to filter them out now.
  5. The date predicted under different unique_id differ a lot, some are 2017, while others are 2011. I think this is due to the fact that the next 3 step is based on the latest date from the training dataset, which can differ. I feel in reality, users want to predict at the same point in time.

@jarulraj
Copy link
Member

@americast While you are fixing some of these issues, we could also discuss the plan for fixing here.

@americast
Copy link
Member

@americast While you are fixing some of these issues, we could also discuss the plan for fixing here.

Sure @jarulraj. Thanks @xzdandy for the review!

Hi @americast, please review the updated notebook. Below are several issues I am still facing now:

  1. It is not clear how to choose the frequency.

Yes, it can get a little confusing. I will send a separate PR for the frequency -related discussion.

  1. It is not clear how to decide which model / parameters are better. Do we have any measurable / quantitative metrics we can offer after the training.

We should add a metric for normalized RMSE or Interval Score. I shall take care of that in #1258

  1. Is the NeuralForecast training time and accuracy tunable? 28 minutes with neuralforecast vs 21 seconds with statsforecast is a huge gap.

It's not very linear. With larger datasets, statsforecast might as well take a lot more time than neuralforecast. The amount of time taken by neuralforecast is kind of going to be linear corresponding to the number of unique IDs. For statsforecast, it might grow non-linearly with more data.

  1. Even though, we fix the math domain error when there is only one data point, there are many 0 outputs, which does not make sense. I am using WHERE price > 0 to filter them out now.

That's weird. Will check that. Anyway, forecasting with just one data point doesn't really make much sense. Perhaps we should also return some suggestion or warning?

  1. The date predicted under different unique_id differ a lot, some are 2017, while others are 2011. I think this is due to the fact that the next 3 step is based on the latest date from the training dataset, which can differ. I feel in reality, users want to predict at the same point in time.

This is an interesting problem. Perhaps we can ask for a time step range where the user wants forecast and predict at that step.

As of now, I am trying to come up with a confidence interval in forecasting, as well as a metric, that would better help analyze which method works the best. The entire setup could be a part of the feedback system. I'll be adding my commits in #1258. I'll update this doc with the metrics once that's merged.

@xzdandy xzdandy removed this from the v0.3.9 milestone Nov 19, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
AI Engines Features, Bugs, related to AI Engines
Projects
Development

Successfully merging this pull request may close these issues.

Explore the Neuralforecast in the 16-homesale-forecasting.ipynb
3 participants