[WIP] Prototype for timeseries plots #1747

utkarsh-maheshwari · 2021-07-26T19:07:32Z

Description

This is another plot that I worked on under Google Summer of Code 2021. It includes overall time series and its components plots.
High level design:

y : from observed_data,
x : from constant_data or observed_data dims.
x_holdout : from constant_data or posterior_predictive dims
y_holdout : from observed_data (or constant data)
y_ppc : from posterior_predictive, default name same as y
forecast : from posterior_predictive, default name same as y_holdout
components : from posterior, default None

Please see the docstring in files changed for a more detailed explanation of the user API.

This plot successfully plots the overall time series for the matplotlib backend only. The added functionality is well-tested and nicely explained using examples in docstring.

Work yet to be done: Bokeh plot function is yet to be added may be in a different PR. Also, functionality to plot components is aimed to be covered in a future PR.

Plots:

Checklist

Follows official PR format
Includes a sample plot to visually illustrate the changes (only for plot-related functions)
New features are properly documented (with an example if appropriate)?
Includes new or updated tests to cover the new feature
Code style correct (follows pylint and black guidelines)
Changes are listed in changelog

utkarsh-maheshwari · 2021-07-26T19:10:07Z

Also, I guess I am not authorized to add labels. Hence, this and other PR is label-less.

arviz/plots/tsplot.py

OriolAbril · 2021-07-28T20:58:55Z

holdout : int, Optional. No of samples from end. (Could be modified to take particular coordinate val, then find the value in x )

I would have the holdout be separate arrays, both x and y (and optionally observations which I hardly ever expect to be known in this setting). The reason for that is that while it is possible that both posterior predictive samples for observed data and out of sample posterior predictive samples (often called predicions or forecasts) are stored in the same array, I expect them being in different arrays to be more common. Moreover, going from single to multiple array is a "simple" slice operation whereas from multiple to single requires concatenation/stacking.

utkarsh-maheshwari · 2021-07-29T19:30:00Z

@OriolAbril
You are right. I gave it a thought but implement this one first as it was easier 😛.
If we use holdout like this then it should be like :

If the user split data into train and test, generating x_train, y_train, x_test, y_test.

y: y_train, from observed_data,
x: x_train, from constant_data or observed_data dims.
x_holdout: x_test, from constant_data or posterior_predictive dims
y_holdout: y_test, from observed_data (or constant data)
y_ppc: y_train_pred (bayesian models exclusive), from posterior_predictive, default name same as y
forecast: y_test_pred, from posterior_predictive, default name same as y_holdout

We'll plot whatever information the user provides.

This looks better to me. Need confirmation from @canyon289 @ahartikainen @OriolAbril on the design?

arviz/plots/tsplot.py

utkarsh-maheshwari · 2021-08-02T19:49:04Z

Some points to be noted:

y: y_train, from observed_data,

x: x_train, from constant_data or observed_data dims.

x_holdout: x_test, from constant_data or posterior_predictive dims

y_holdout: y_test, from observed_data (or constant data)

y_ppc: y_train_pred (bayesian models exclusive), from posterior_predictive, default name same as y

forecast: y_test_pred, from posterior_predictive, default name same as y_holdout

Started implementing it. One possible drawback of using this design is we would be generating so many of plotters arrays.
Estimated number is
1+1+1+1+2+2 (+4 plotters for components) = 12.

Is this a concern with respect to complexity and maintainability ?

utkarsh-maheshwari · 2021-08-05T20:16:33Z

It is getting a bit complex. So, I am assuming data is 1D for now. Will extend it for multi dim data once it is approved.

utkarsh-maheshwari · 2021-08-05T20:55:20Z

Also not focusing on components plots for now. It is getting complex because there are a lot of cases to consider in this design. For example, when y_hat and y_forecast both are provided, we'll be plotting only y_forecast uncertainty (that is uncertainty in the forecasted data) and not y_hat (Uncertainty in the observed data).
However, if y_forecast is not provided, we'll be plotting uncertainty in observed data.

So, making the plotters to pass them to the backend function is getting complex.

codecov · 2021-08-07T20:53:44Z

Codecov Report

Merging #1747 (2ceab7a) into main (928f8f4) will increase coverage by 0.05%.
The diff coverage is 94.59%.

❗ Current head 2ceab7a differs from pull request most recent head c52825b. Consider uploading reports for the commit c52825b to get more accurate results

@@            Coverage Diff             @@
##             main    #1747      +/-   ##
==========================================
+ Coverage   91.18%   91.24%   +0.05%     
==========================================
  Files         115      117       +2     
  Lines       12248    12433     +185     
==========================================
+ Hits        11168    11344     +176     
- Misses       1080     1089       +9

Impacted Files	Coverage Δ
arviz/plots/tsplot.py	`91.66% <91.66%> (ø)`
arviz/plots/__init__.py	`100.00% <100.00%> (ø)`
arviz/plots/backends/matplotlib/tsplot.py	`100.00% <100.00%> (ø)`
arviz/data/io_dict.py	`93.38% <0.00%> (+0.82%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 928f8f4...c52825b. Read the comment docs.

arviz/plots/tsplot.py

OriolAbril

added some nitpicky comments, mostly about numpydoc standard: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard. They should not be too lengthy to take care of

arviz/plots/tsplot.py

ahartikainen

Looks good. Bokeh can be implemented later.

OriolAbril · 2021-08-27T17:34:02Z

Also needs to be added to docs, now it has a docstring but won't appear on the website

OriolAbril reviewed Jul 28, 2021

View reviewed changes

arviz/plots/tsplot.py Outdated Show resolved Hide resolved

arviz/plots/tsplot.py Outdated Show resolved Hide resolved

canyon289 reviewed Jul 30, 2021

View reviewed changes

arviz/plots/tsplot.py Outdated Show resolved Hide resolved

utkarsh-maheshwari added 2 commits August 8, 2021 02:00

Prototype for timeseries plots

dd656b6

Major commit

11f442e

utkarsh-maheshwari force-pushed the tsplot branch from 7fb4057 to 11f442e Compare August 7, 2021 20:38

Added more functionality and tests

76f6d23

canyon289 reviewed Aug 10, 2021

View reviewed changes

arviz/plots/tsplot.py Show resolved Hide resolved

utkarsh-maheshwari added 2 commits August 10, 2021 23:17

More tests and examples

c1a69ff

More tests and examples

5dc3bdf

OriolAbril reviewed Aug 11, 2021

View reviewed changes

arviz/plots/tsplot.py Outdated Show resolved Hide resolved

arviz/plots/tsplot.py Outdated Show resolved Hide resolved

arviz/plots/tsplot.py Outdated Show resolved Hide resolved

arviz/plots/tsplot.py Outdated Show resolved Hide resolved

canyon289 added the GSOC label Aug 11, 2021