Improve documentation for sampling without replacement #1210

OliverEvans96 · 2022-01-07T20:47:46Z

Hello,

I see that sampling without replacement was requested in #596 and implemented in #1013, however the documentation is not very helpful to someone trying to figure out how to use this feature.

Searching for "replacement" in the docs yields no results
The word "replacement" does not appear in any of the code, test cases, or comments modified in Re: weighted sampling without replacement #1013
The Weighted Sequences section in the guide chapter of the rand book is out of date:

Currently the Rand lib only implements sampling with replacement, i.e. repeated sampling assumes the same distribution (that any sampled marble has been replaced). An alternative distribution implementing sampling without replacement has been requested.

The Sequences section in the Updating to 0.8 chapter of the rand book only says:

Weighted sampling without replacement is now supported, see rand::seq::index::sample_weighted and SliceRandom::choose_multiple_weighted.

SliceRandom::choose_multiple_weighted has no doc comments or examples
rand::seq::index::sample_weighted does contain a helpful explanation, although it does not contain the word "replacement":

Randomly sample exactly amount distinct indices from 0..length, and return them in an arbitrary order (there is no guarantee of shuffling or ordering). The weights are to be provided by the input function weights, which will be called once for each index.

So it took me 6 attempts to find an explanation, which is still only sampling indices, not sampling from a distribution.

It would be really awesome if a search for "replacement" in both the docs and the book returned a clear explanation with a usage example.

Thanks!
Oliver

dhardy · 2022-01-10T08:57:58Z

Hi there, and thanks for pointing out the poor state of the docs.

~~First, the docs at rust-random.github.io are either outdated or broken. I guess we should fix them. On docs.rs: rand::seq::SliceRandom::choose_multiple_weighted has documentation.~~ (See below.)

So if I understand correctly this issue is about:

~~fix documentation deployment to rust-random.github.io~~
ensure "replacement" is mentioned in the docs (though should you not search for "weighted" instead)?
expand the book to cover weighted sampling without replacement (we already have Weighted sequences)
fix SliceRandom::choose_multiple_weighted link in updating guide

dhardy · 2022-01-10T10:01:22Z

How did you even find SliceRandom::choose_multiple_weighted? Because navigating the docs normally or even using the search I get to SliceRandom::choose_multiple_weighted (only difference: #method vs #tymethod) which includes the documentation.

OliverEvans96 · 2022-01-10T16:53:58Z

The #method variant is linked from the Sequences section in Updating to 0.8.

OliverEvans96 · 2022-01-10T17:06:47Z

I hope you'll excuse my ignorance, but, whether a set of samples are drawn with constant weights and whether they are drawn with replacement seem orthogonal.

I'm very curious to hear why I'm wrong. But I certainly would not have searched for "weighted" when trying to perform a uniform sample without replacement.

Or is the idea that I should basically implement choose_multiple_without_replacement myself by calling choose_weighted in a loop, setting the weight of any element that has previously been chosen to 0?

dhardy · 2022-01-12T10:54:29Z

whether a set of samples are drawn with constant weights and whether they are drawn with replacement seem orthogonal.

If drawing without replacement, then the probability of each value being sampled is affected by each value drawn. There are two ways of simulating this: (1) maintaining a list of "taken" values and using rejection sampling (repeating the process if the sample is already "taken"), or (2) maintaining a representation of the probability of each remaining value being sampled. Method (2) is essentially weighted (or biased) sampling.

Thus, in abstract, the concepts might be considered orthogonal, but when simulating such sampling, the concepts are related.

Or is the idea that I should basically implement choose_multiple_without_replacement myself by calling choose_weighted in a loop, setting the weight of any element that has previously been chosen to 0?

This is method (2), and works fine where the pool of available values is small, but not so well if e.g. there are a million possible values and you only wish to sample 1000 of them.

But let me ask: is there some particular thing you're still struggling to understand how to implement, or are you merely suggesting parts of the documentation that might be improved? Because I have a suspicion that the documentation on choose_multiple and choose_multiple_weighted does not meet your requirements?

If you wish to modify the distribution you are sampling from with each sample taken, that is not supported by the Distribution trait (sample takes &self), so, yes, you'd have to implement this yourself using one of the above methods (you might wish to refer to seq::index code).

vks · 2022-01-13T13:02:31Z

@OliverEvans96 rand::seq::index::sample probably does what you want, unless you want to get the samples one-by-one. At the moment, we don't offer such an interface, because it is incompatible with our Distribution trait, which only supports independent samples.

dhardy mentioned this issue Jan 27, 2022

Add src/guide-process rust-random/book#48

Merged

vks closed this as completed in rust-random/book#48 Feb 14, 2022

fraterenz mentioned this issue Aug 10, 2022

Sampling without replacement? fraterenz/ecdna-evo#69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve documentation for sampling without replacement #1210

Improve documentation for sampling without replacement #1210

OliverEvans96 commented Jan 7, 2022 •

edited

Loading

dhardy commented Jan 10, 2022 •

edited by vks

Loading

Uh oh!

dhardy commented Jan 10, 2022

Uh oh!

OliverEvans96 commented Jan 10, 2022

Uh oh!

OliverEvans96 commented Jan 10, 2022

Uh oh!

dhardy commented Jan 12, 2022 •

edited

Loading

Uh oh!

vks commented Jan 13, 2022

Uh oh!

Uh oh!

Improve documentation for sampling without replacement #1210

Improve documentation for sampling without replacement #1210

Comments

OliverEvans96 commented Jan 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dhardy commented Jan 10, 2022 • edited by vks Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Jan 10, 2022

Uh oh!

OliverEvans96 commented Jan 10, 2022

Uh oh!

OliverEvans96 commented Jan 10, 2022

Uh oh!

dhardy commented Jan 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vks commented Jan 13, 2022

Uh oh!

OliverEvans96 commented Jan 7, 2022 •

edited

Loading

dhardy commented Jan 10, 2022 •

edited by vks

Loading

dhardy commented Jan 12, 2022 •

edited

Loading