-
-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
add extract_dataset function #1725
Conversation
This function should be able to combine data-arrays from different groups if needed? |
that could be cool too, how would you indicate that? instead of group and var_names arguments have a group_vars one that is a dict of group names as keys lists of variables names as values? |
That sounds reasonable option to go with. |
arviz/utils.py
Outdated
if rng is not False: | ||
if rng is True: | ||
rng = np.random.default_rng() | ||
elif isinstance(rng, int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should use isinstance(rng, numbers.Integral)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified this to a try except, turns out default_rng also takes sequence of integers plus maybe someone uses a subclassed generator. Catched two errors to hopefully provide more informative errors.
arviz/utils.py
Outdated
rng = np.random.default_rng(rng) | ||
if not isinstance(rng, np.random.Generator): | ||
raise ValueError("Could not interpret rng as np.random.Generator") | ||
random_subset = rng.permutation(np.arange(len(data["__sample__"]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still be sorted? (Imagine each chain is stuck --> this problem can still be seen even if rng is used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see comment about stack routine --> if true, then this comment is not relevant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results are only sorted if
- all samples are returned, stacked or not. I assumed in this case, users are keeping all the samples to use all of them, not to cut them right after that. But we can change the default for the stacked case though.
rng=False
explicitly (not the default)
In all other cases, the result is in this random_subset
order. For example:
In [2]: az.extract_dataset(idata, combined=True, var_names="home", num_samples=4)
Out[2]:
<xarray.Dataset>
Dimensions: (sample: 4)
Coordinates:
* sample (sample) MultiIndex
- chain (sample) int64 3 2 1 0
- draw (sample) int64 36 390 375 496
Data variables:
home (sample) float64 0.1911 0.1604 0.1791 0.1109
Attributes:
created_at: 2019-07-12T20:31:53.545143
inference_library: pymc3
inference_library_version: 3.7
arviz/utils.py
Outdated
random_subset = rng.permutation(np.arange(len(data["__sample__"]))) | ||
data = data.isel(__sample__=random_subset) | ||
if num_samples is not None: | ||
data = data.isel(__sample__=slice(None, num_samples)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a bit more complex thing here?
Instead of returning n first samples, we could return n/nchain samples from each chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, due to our stack routine it is already doing it?
will review and update this week |
a2c885d
to
a65b8df
Compare
Should be ready to merge now, but still happy to have reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Remember to update the changelog
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Description
Fixes #1469. I think this will cover most casesin the issue and several
more common enough to be relevant like random subsetting for plotting for example.
I probably won't be able to work on this and finish it in the immediate future, if we want this to
be included in the next release someone else should take over, add tests and some examples in the
docstring. If interested comment and go for it, feel free to open a pr to this branch, create a
new PR that builds on top of this work, or work directly on this (if you have permissions to do so).
Checklist
section of the changelog?