Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[WIP] Prototype for timeseries plots #1747

Merged
merged 9 commits into from
Aug 27, 2021

Conversation

utkarsh-maheshwari
Copy link
Contributor

@utkarsh-maheshwari utkarsh-maheshwari commented Jul 26, 2021

Description

This is another plot that I worked on under Google Summer of Code 2021. It includes overall time series and its components plots.
High level design:

  • y : from observed_data,
  • x : from constant_data or observed_data dims.
  • x_holdout : from constant_data or posterior_predictive dims
  • y_holdout : from observed_data (or constant data)
  • y_ppc : from posterior_predictive, default name same as y
  • forecast : from posterior_predictive, default name same as y_holdout
  • components : from posterior, default None

Please see the docstring in files changed for a more detailed explanation of the user API.

This plot successfully plots the overall time series for the matplotlib backend only. The added functionality is well-tested and nicely explained using examples in docstring.

Work yet to be done: Bokeh plot function is yet to be added may be in a different PR. Also, functionality to plot components is aimed to be covered in a future PR.

Plots:
image

Checklist

  • Follows official PR format
  • Includes a sample plot to visually illustrate the changes (only for plot-related functions)
  • New features are properly documented (with an example if appropriate)?
  • Includes new or updated tests to cover the new feature
  • Code style correct (follows pylint and black guidelines)
  • Changes are listed in changelog

@utkarsh-maheshwari
Copy link
Contributor Author

Also, I guess I am not authorized to add labels. Hence, this and other PR is label-less.

@OriolAbril
Copy link
Member

holdout : int, Optional. No of samples from end. (Could be modified to take particular coordinate val, then find the value in x )

I would have the holdout be separate arrays, both x and y (and optionally observations which I hardly ever expect to be known in this setting). The reason for that is that while it is possible that both posterior predictive samples for observed data and out of sample posterior predictive samples (often called predicions or forecasts) are stored in the same array, I expect them being in different arrays to be more common. Moreover, going from single to multiple array is a "simple" slice operation whereas from multiple to single requires concatenation/stacking.

@utkarsh-maheshwari
Copy link
Contributor Author

utkarsh-maheshwari commented Jul 29, 2021

@OriolAbril
You are right. I gave it a thought but implement this one first as it was easier 😛.
If we use holdout like this then it should be like :

If the user split data into train and test, generating x_train, y_train, x_test, y_test.

  • y: y_train, from observed_data,
  • x: x_train, from constant_data or observed_data dims.
  • x_holdout: x_test, from constant_data or posterior_predictive dims
  • y_holdout: y_test, from observed_data (or constant data)
  • y_ppc: y_train_pred (bayesian models exclusive), from posterior_predictive, default name same as y
  • forecast: y_test_pred, from posterior_predictive, default name same as y_holdout

We'll plot whatever information the user provides.

This looks better to me. Need confirmation from @canyon289 @ahartikainen @OriolAbril on the design?

@utkarsh-maheshwari
Copy link
Contributor Author

Some points to be noted:

  • y: y_train, from observed_data,
  • x: x_train, from constant_data or observed_data dims.
  • x_holdout: x_test, from constant_data or posterior_predictive dims
  • y_holdout: y_test, from observed_data (or constant data)
  • y_ppc: y_train_pred (bayesian models exclusive), from posterior_predictive, default name same as y
  • forecast: y_test_pred, from posterior_predictive, default name same as y_holdout

Started implementing it. One possible drawback of using this design is we would be generating so many of plotters arrays.
Estimated number is
1+1+1+1+2+2 (+4 plotters for components) = 12.

Is this a concern with respect to complexity and maintainability ?

@utkarsh-maheshwari
Copy link
Contributor Author

It is getting a bit complex. So, I am assuming data is 1D for now. Will extend it for multi dim data once it is approved.

@utkarsh-maheshwari
Copy link
Contributor Author

Also not focusing on components plots for now. It is getting complex because there are a lot of cases to consider in this design. For example, when y_hat and y_forecast both are provided, we'll be plotting only y_forecast uncertainty (that is uncertainty in the forecasted data) and not y_hat (Uncertainty in the observed data).
However, if y_forecast is not provided, we'll be plotting uncertainty in observed data.

So, making the plotters to pass them to the backend function is getting complex.

@codecov
Copy link

codecov bot commented Aug 7, 2021

Codecov Report

Merging #1747 (2ceab7a) into main (928f8f4) will increase coverage by 0.05%.
The diff coverage is 94.59%.

❗ Current head 2ceab7a differs from pull request most recent head c52825b. Consider uploading reports for the commit c52825b to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1747      +/-   ##
==========================================
+ Coverage   91.18%   91.24%   +0.05%     
==========================================
  Files         115      117       +2     
  Lines       12248    12433     +185     
==========================================
+ Hits        11168    11344     +176     
- Misses       1080     1089       +9     
Impacted Files Coverage Δ
arviz/plots/tsplot.py 91.66% <91.66%> (ø)
arviz/plots/__init__.py 100.00% <100.00%> (ø)
arviz/plots/backends/matplotlib/tsplot.py 100.00% <100.00%> (ø)
arviz/data/io_dict.py 93.38% <0.00%> (+0.82%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 928f8f4...c52825b. Read the comment docs.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some nitpicky comments, mostly about numpydoc standard: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard. They should not be too lengthy to take care of

@canyon289 canyon289 added the GSOC label Aug 11, 2021
Copy link
Contributor

@ahartikainen ahartikainen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Bokeh can be implemented later.

@ahartikainen ahartikainen merged commit bb2af3b into arviz-devs:main Aug 27, 2021
@OriolAbril
Copy link
Member

Also needs to be added to docs, now it has a docstring but won't appear on the website

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants