Python tools to sample randomly with dont pick closest n
elements constraints.
Also contains a batch generator for the same to sample with replacement and with repeats if necessary.
Simply install using pip
pip install sampling_utils
from sampling_utils import sample_from_list
sample_from_list([1,2,3,4,5,6,7,8], dont_pick_closest=2)
You are guaranteed to get samples that are at least dont_pick_closest
apart# (in value, not in indices).
Here you will get samples where sample
- any_other_sample
is always greater than 2.
For example, if 2 is picked, no other item in range [2+dont_pick_closest
and 2-dont_pick_closest
] will be picked
Another example looped 5 times:
for _ in range(5):
sample_from_list([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2)
# Output
# [5, 10, 2, 14]
# [9, 6, 14, 1]
# [3, 8, 12]
# [10, 3, 6, 14]
# [2, 5, 8, 12]
If 12 is sampled, sampling 10 and 14 are not allowed since dont_pick_closest
is 2.
In other words, if n
is sampled, then sampling anything from [n-dont_pick_closest, ... n-1, n , n+1, ... n+dont_pick_closest]
is not allowed (if present in the list).
#Will be called as dont_pick_closest rule hereafter.
You can also specify how many samples you want from the list using number_of_samples
parameter.
By default, you get maximum possible samples (without replacement).
for _ in range(5):
sample_from_list([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2, num_samples=2)
# Output
# [8, 2]
# [6, 3]
# [12, 1]
# [4, 10]
# [9, 1]
If you try to sample more than what's possible, you will get an error saying that it's not possible.
You may want to just know how much you can sample from a given list obeying the dont_pick_closest rule
from sampling_utils import get_min_samples, get_max_samples
print(get_min_samples([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2))
print(get_max_samples([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2))
# Output
# Min 3
# Max 4
If you want to successively sample without replacement i.e. sample as many samples from the list without repeating,
you can use batch_rand_generator
as shown below.
This is particularly useful to generate batches of data
until no more batches can be generated (equivalent to one epoch).
from sampling_utils import batch_rand_generator
from sampling_utils import get_batch_generator_elements
batch_size = 2
brg = batch_rand_generator([1,2,3,4,5,6,8,9,10,12,14], batch_size=batch_size, dont_pick_closest=2)
print(get_batch_generator_elements(brg, batch_size=batch_size))
# Output
# [[1, 4], [8, 5], [14, 3], [2, 6]]
Notice that the elements
- within each batch obey the dont_pick_closest rule (e.g. 1 and 4 from batch 1)
- from different batches need not obey the rule (e.g. 4 and 5 from batch 1 and 2 respectively).
Pull requests are very welcome.
- Fork the repo
- Create new branch with feature name as branch name
- Check if things work with a jupyter notebook
- Raise a pull request
Please see attached Licence