adding time slice to ts.samples() #1700

mufernando · 2021-09-14T17:15:13Z

Description

WIP

Fixes #1692

PR Checklist:

Tests that fully cover new/changed functionality.
Documentation including tutorial content if appropriate.
Changelogs, if there are API changes.

mufernando · 2021-09-14T18:04:22Z

A few questions:

Should both min and max be inclusive?
Does this approach look reasonable (or do we expect too much of a performance hit?)?
Ideas for more tests?

jeromekelleher · 2021-09-14T18:12:22Z

LGTM, thanks @mufernando!

Approach is great, I don't foresee any perf hit.
I'd vote for making max exclusive following our usual interval semantics so that for any sample with time t, min_time <= t < max_time.
In terms of testing, I think checking explicitly with a few hand coded examples on a small tree sequence would be good. (as in, you explicitly assert for the known set of samples)

petrelharp · 2021-09-14T19:02:28Z

I'd vote for making max exclusive following our usual interval semantics so that for any sample with time t, min_time <= t < max_time.

I agree... except, how do we deal with the probably most common use for this, which is to get everyone alive at a given time (eg 0.0)? For that most people would probably try ts.samples(max_time = 0.0), which would not work.

One option would be to make the argument time_interval instead of two separate min and max times, and if it's just a single number instead of a tuple, we return all the samples whose times are equal to that number? Or, add two arguments: ts.samples(time_interval=None, time=None), where time_interval is a closed-on-the-left-open-on-the-right interval, and time is just a single time, and you can't do both of them?

I agree about the testing, and be sure to include some times like 1/3 where we might run into floating point error.

mufernando · 2021-09-15T19:47:53Z

I agree with @petrelharp that one of the most used cases would be to look at a single time point (t=0), so I went ahead and added two new parameters time and time_interval.

I never seem to know exactly how to test on a single known answer, so I did the next best thing (that I could think of) which was to re-implement the behavior independently using a simple for loop instead of numpy.

mufernando · 2021-09-15T19:49:59Z

ah, also unsure whether theses tests should be where they are within test_highlevel.py:TestNumpySamples.

codecov · 2021-09-15T20:15:40Z

Codecov Report

Merging #1700 (04d2458) into main (fe6121a) will increase coverage by 0.42%.
The diff coverage is 100.00%.

❗ Current head 04d2458 differs from pull request most recent head 1efee21. Consider uploading reports for the commit 1efee21 to get more accurate results

@@            Coverage Diff             @@
##             main    #1700      +/-   ##
==========================================
+ Coverage   93.36%   93.79%   +0.42%     
==========================================
  Files          27       27              
  Lines       24235    23590     -645     
  Branches     1089     1089              
==========================================
- Hits        22627    22126     -501     
+ Misses       1573     1429     -144     
  Partials       35       35

Flag	Coverage Δ
c-tests	`92.04% <ø> (-0.06%)`	⬇️
lwt-tests	`93.49% <ø> (+4.23%)`	⬆️
python-c-tests	`95.47% <100.00%> (+0.96%)`	⬆️
python-tests	`98.77% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/tskit/trees.py	`97.84% <100.00%> (+0.01%)`	⬆️
c/tskit/tables.c	`90.09% <0.00%> (-0.10%)`	⬇️
python/tskit/tables.py	`98.91% <0.00%> (+0.09%)`	⬆️
python/_tskitmodule.c	`92.36% <0.00%> (+0.88%)`	⬆️
python/lwt_interface/example_c_module.c	`75.47% <0.00%> (+2.74%)`	⬆️
python/lwt_interface/tskit_lwt_interface.h	`95.41% <0.00%> (+4.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe6121a...1efee21. Read the comment docs.

jeromekelleher · 2021-09-16T08:46:21Z

Looks good @mufernando. I'm not sure there's much point in having time and time_interval though - how about we have a single parameter time, and

If time is a numeric value, return all samples whose node time is approximately equal (as determined by numpy.isclose) to the specified value
Otherwise, if time is a pair of values (min_time, max_time) and we return all samples whose node time t is in this interval such that min_time <= t < max_time.

That seems straightforward enough?

mufernando · 2021-09-16T15:00:26Z

Looks good @mufernando. I'm not sure there's much point in having time and time_interval though - how about we have a single parameter time, and

If time is a numeric value, return all samples whose node time is approximately equal (as determined by numpy.isclose) to the specified value

Otherwise, if time is a pair of values (min_time, max_time) and we return all samples whose node time t is in this interval such that min_time <= t < max_time.

That seems straightforward enough?

Done!

waiting for a review and then I can squash and rebase.

petrelharp · 2021-09-17T01:50:14Z

LGTM! I added a few more tests, including one where we build a simple explicit tree sequence.

jeromekelleher

LGTM, some minor changes in how we handle the type polymorphism suggested.

We should add some simple tests to make sure that numpy arrays are supported as input. I.e, something like

x = np.array([1, 2])
ts.samples(time=x[0]) # time == 1
ts.samples(time=x) # time between 1 and 2

python/tskit/trees.py

mufernando · 2021-09-17T15:47:40Z

Updated to work with numpy arrays (including 0d and (1,) arrays). Added some tests with these types of array also.

jeromekelleher

LGTM, thanks @mufernando. One final comment about using parametrize in tests rather than loops to get better visibility on test examples.

python/tests/test_highlevel.py

mufernando · 2021-09-17T16:44:27Z

I think we are ready to merge!

Thanks you all!

python/tests/test_highlevel.py

petrelharp · 2021-09-17T16:50:45Z

Sorry, one more thing! (see comments)

mufernando · 2021-09-17T16:56:36Z

Fixed it!

mufernando requested a review from petrelharp September 14, 2021 18:04

mufernando force-pushed the time-slice-samples branch from 59d4682 to f36fc0b Compare September 14, 2021 18:08

mufernando force-pushed the time-slice-samples branch from f36fc0b to 7e84ef0 Compare September 14, 2021 20:35

mufernando marked this pull request as ready for review September 15, 2021 19:50

petrelharp approved these changes Sep 17, 2021

View reviewed changes

jeromekelleher reviewed Sep 17, 2021

View reviewed changes

python/tskit/trees.py Outdated Show resolved Hide resolved

python/tskit/trees.py Outdated Show resolved Hide resolved

python/tskit/trees.py Outdated Show resolved Hide resolved

mufernando force-pushed the time-slice-samples branch 2 times, most recently from ce9fc27 to cf8829e Compare September 17, 2021 15:53

jeromekelleher approved these changes Sep 17, 2021

View reviewed changes

python/tests/test_highlevel.py Show resolved Hide resolved

mufernando force-pushed the time-slice-samples branch from cf8829e to aece147 Compare September 17, 2021 16:42

petrelharp reviewed Sep 17, 2021

View reviewed changes

python/tests/test_highlevel.py Outdated Show resolved Hide resolved

petrelharp reviewed Sep 17, 2021

View reviewed changes

python/tests/test_highlevel.py Outdated Show resolved Hide resolved

mufernando force-pushed the time-slice-samples branch from aece147 to 9677a97 Compare September 17, 2021 16:55

mufernando force-pushed the time-slice-samples branch from 9677a97 to 04d2458 Compare September 17, 2021 17:46

jeromekelleher added the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Sep 17, 2021

adding time slicing to ts.samples()

1efee21

AdminBot-tskit force-pushed the time-slice-samples branch from 04d2458 to 1efee21 Compare September 17, 2021 18:19

mergify bot merged commit 098867a into tskit-dev:main Sep 17, 2021

mergify bot removed the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Sep 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding time slice to ts.samples() #1700

adding time slice to ts.samples() #1700

mufernando commented Sep 14, 2021 •

edited

Loading

mufernando commented Sep 14, 2021

jeromekelleher commented Sep 14, 2021 •

edited

Loading

petrelharp commented Sep 14, 2021

mufernando commented Sep 15, 2021

mufernando commented Sep 15, 2021

codecov bot commented Sep 15, 2021 •

edited

Loading

jeromekelleher commented Sep 16, 2021

mufernando commented Sep 16, 2021

petrelharp commented Sep 17, 2021

jeromekelleher left a comment

mufernando commented Sep 17, 2021

jeromekelleher left a comment

mufernando commented Sep 17, 2021

petrelharp commented Sep 17, 2021

mufernando commented Sep 17, 2021

adding time slice to ts.samples() #1700

adding time slice to ts.samples() #1700

Conversation

mufernando commented Sep 14, 2021 • edited Loading

Description

PR Checklist:

mufernando commented Sep 14, 2021

jeromekelleher commented Sep 14, 2021 • edited Loading

petrelharp commented Sep 14, 2021

mufernando commented Sep 15, 2021

mufernando commented Sep 15, 2021

codecov bot commented Sep 15, 2021 • edited Loading

Codecov Report

jeromekelleher commented Sep 16, 2021

mufernando commented Sep 16, 2021

petrelharp commented Sep 17, 2021

jeromekelleher left a comment

Choose a reason for hiding this comment

mufernando commented Sep 17, 2021

jeromekelleher left a comment

Choose a reason for hiding this comment

mufernando commented Sep 17, 2021

petrelharp commented Sep 17, 2021

mufernando commented Sep 17, 2021

mufernando commented Sep 14, 2021 •

edited

Loading

jeromekelleher commented Sep 14, 2021 •

edited

Loading

codecov bot commented Sep 15, 2021 •

edited

Loading