add extract_dataset function #1725

OriolAbril · 2021-06-12T19:38:53Z

Description

Fixes #1469. I think this will cover most casesin the issue and several
more common enough to be relevant like random subsetting for plotting for example.

I probably won't be able to work on this and finish it in the immediate future, if we want this to
be included in the next release someone else should take over, add tests and some examples in the
docstring. If interested comment and go for it, feel free to open a pr to this branch, create a
new PR that builds on top of this work, or work directly on this (if you have permissions to do so).

Checklist

Does the PR follow official PR format?
Is the new feature properly documented with an example?
Does the PR include new or updated tests to cover the new feature (using pytest fixture pattern)?
Is the code style correct (follows pylint and black guidelines)?
Is the new feature listed in the New features
section of the changelog?

ahartikainen · 2021-06-14T09:09:00Z

This function should be able to combine data-arrays from different groups if needed?

OriolAbril · 2021-06-14T12:02:38Z

that could be cool too, how would you indicate that? instead of group and var_names arguments have a group_vars one that is a dict of group names as keys lists of variables names as values?

ahartikainen · 2021-06-14T12:30:56Z

That sounds reasonable option to go with.

ahartikainen · 2022-01-16T07:14:48Z

arviz/utils.py

+    if rng is not False:
+        if rng is True:
+            rng = np.random.default_rng()
+        elif isinstance(rng, int):


this should use isinstance(rng, numbers.Integral)

modified this to a try except, turns out default_rng also takes sequence of integers plus maybe someone uses a subclassed generator. Catched two errors to hopefully provide more informative errors.

ahartikainen · 2022-01-16T07:16:30Z

arviz/utils.py

+            rng = np.random.default_rng(rng)
+        if not isinstance(rng, np.random.Generator):
+            raise ValueError("Could not interpret rng as np.random.Generator")
+        random_subset = rng.permutation(np.arange(len(data["__sample__"])))


Should this still be sorted? (Imagine each chain is stuck --> this problem can still be seen even if rng is used)

(see comment about stack routine --> if true, then this comment is not relevant)

The results are only sorted if

all samples are returned, stacked or not. I assumed in this case, users are keeping all the samples to use all of them, not to cut them right after that. But we can change the default for the stacked case though.

rng=False explicitly (not the default)

In all other cases, the result is in this random_subset order. For example:

In [2]: az.extract_dataset(idata, combined=True, var_names="home", num_samples=4) Out[2]: <xarray.Dataset> Dimensions: (sample: 4) Coordinates: * sample (sample) MultiIndex - chain (sample) int64 3 2 1 0 - draw (sample) int64 36 390 375 496 Data variables: home (sample) float64 0.1911 0.1604 0.1791 0.1109 Attributes: created_at: 2019-07-12T20:31:53.545143 inference_library: pymc3 inference_library_version: 3.7

ahartikainen · 2022-01-16T07:18:05Z

arviz/utils.py

+        random_subset = rng.permutation(np.arange(len(data["__sample__"])))
+        data = data.isel(__sample__=random_subset)
+    if num_samples is not None:
+        data = data.isel(__sample__=slice(None, num_samples))


Should we do a bit more complex thing here?

Instead of returning n first samples, we could return n/nchain samples from each chain?

Oh wait, due to our stack routine it is already doing it?

OriolAbril · 2022-01-17T16:34:28Z

will review and update this week

OriolAbril · 2022-01-17T21:13:55Z

Should be ready to merge now, but still happy to have reviews!

aloctavodia

LGTM! Remember to update the changelog

review-notebook-app · 2022-01-18T23:31:37Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

OriolAbril mentioned this pull request Oct 22, 2021

Better API for obtaining posterior point estimates & more #1899

Open

ahartikainen reviewed Jan 16, 2022

View reviewed changes

OriolAbril added 2 commits January 17, 2022 20:06

add extract_dataset function

327e69d

add tests and examples in docs

a65b8df

OriolAbril force-pushed the stack_util branch from a2c885d to a65b8df Compare January 17, 2022 19:18

OriolAbril added 3 commits January 17, 2022 21:22

lint

588d452

fix examples

096211f

add references to extract_dataset in docs

9332a04

aloctavodia approved these changes Jan 18, 2022

View reviewed changes

update changelog

e80bed2

OriolAbril merged commit a67d81c into main Jan 19, 2022

OriolAbril deleted the stack_util branch January 19, 2022 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add extract_dataset function #1725

add extract_dataset function #1725

OriolAbril commented Jun 12, 2021 •

edited by aloctavodia

Loading

ahartikainen commented Jun 14, 2021

OriolAbril commented Jun 14, 2021

ahartikainen commented Jun 14, 2021

ahartikainen Jan 16, 2022

OriolAbril Jan 17, 2022

ahartikainen Jan 16, 2022

ahartikainen Jan 16, 2022

OriolAbril Jan 17, 2022

ahartikainen Jan 16, 2022

ahartikainen Jan 16, 2022

OriolAbril commented Jan 17, 2022

OriolAbril commented Jan 17, 2022

aloctavodia left a comment

review-notebook-app bot commented Jan 18, 2022

add extract_dataset function #1725

add extract_dataset function #1725

Conversation

OriolAbril commented Jun 12, 2021 • edited by aloctavodia Loading

Description

Checklist

ahartikainen commented Jun 14, 2021

OriolAbril commented Jun 14, 2021

ahartikainen commented Jun 14, 2021

ahartikainen Jan 16, 2022

Choose a reason for hiding this comment

OriolAbril Jan 17, 2022

Choose a reason for hiding this comment

ahartikainen Jan 16, 2022

Choose a reason for hiding this comment

ahartikainen Jan 16, 2022

Choose a reason for hiding this comment

OriolAbril Jan 17, 2022

Choose a reason for hiding this comment

ahartikainen Jan 16, 2022

Choose a reason for hiding this comment

ahartikainen Jan 16, 2022

Choose a reason for hiding this comment

OriolAbril commented Jan 17, 2022

OriolAbril commented Jan 17, 2022

aloctavodia left a comment

Choose a reason for hiding this comment

review-notebook-app bot commented Jan 18, 2022

OriolAbril commented Jun 12, 2021 •

edited by aloctavodia

Loading