Skip to content

adding time slice to ts.samples() #1700

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Sep 17, 2021

Conversation

mufernando
Copy link
Member

@mufernando mufernando commented Sep 14, 2021

Description

WIP

Fixes #1692

PR Checklist:

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

@mufernando
Copy link
Member Author

A few questions:

  • Should both min and max be inclusive?
  • Does this approach look reasonable (or do we expect too much of a performance hit?)?
  • Ideas for more tests?

@jeromekelleher
Copy link
Member

jeromekelleher commented Sep 14, 2021

LGTM, thanks @mufernando!

  • Approach is great, I don't foresee any perf hit.
  • I'd vote for making max exclusive following our usual interval semantics so that for any sample with time t, min_time <= t < max_time.
  • In terms of testing, I think checking explicitly with a few hand coded examples on a small tree sequence would be good. (as in, you explicitly assert for the known set of samples)

@petrelharp
Copy link
Contributor

I'd vote for making max exclusive following our usual interval semantics so that for any sample with time t, min_time <= t < max_time.

I agree... except, how do we deal with the probably most common use for this, which is to get everyone alive at a given time (eg 0.0)? For that most people would probably try ts.samples(max_time = 0.0), which would not work.

One option would be to make the argument time_interval instead of two separate min and max times, and if it's just a single number instead of a tuple, we return all the samples whose times are equal to that number? Or, add two arguments: ts.samples(time_interval=None, time=None), where time_interval is a closed-on-the-left-open-on-the-right interval, and time is just a single time, and you can't do both of them?

I agree about the testing, and be sure to include some times like 1/3 where we might run into floating point error.

@mufernando
Copy link
Member Author

I agree with @petrelharp that one of the most used cases would be to look at a single time point (t=0), so I went ahead and added two new parameters time and time_interval.

I never seem to know exactly how to test on a single known answer, so I did the next best thing (that I could think of) which was to re-implement the behavior independently using a simple for loop instead of numpy.

@mufernando
Copy link
Member Author

ah, also unsure whether theses tests should be where they are within test_highlevel.py:TestNumpySamples.

@mufernando mufernando marked this pull request as ready for review September 15, 2021 19:50
@codecov
Copy link

codecov bot commented Sep 15, 2021

Codecov Report

Merging #1700 (04d2458) into main (fe6121a) will increase coverage by 0.42%.
The diff coverage is 100.00%.

❗ Current head 04d2458 differs from pull request most recent head 1efee21. Consider uploading reports for the commit 1efee21 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1700      +/-   ##
==========================================
+ Coverage   93.36%   93.79%   +0.42%     
==========================================
  Files          27       27              
  Lines       24235    23590     -645     
  Branches     1089     1089              
==========================================
- Hits        22627    22126     -501     
+ Misses       1573     1429     -144     
  Partials       35       35              
Flag Coverage Δ
c-tests 92.04% <ø> (-0.06%) ⬇️
lwt-tests 93.49% <ø> (+4.23%) ⬆️
python-c-tests 95.47% <100.00%> (+0.96%) ⬆️
python-tests 98.77% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/trees.py 97.84% <100.00%> (+0.01%) ⬆️
c/tskit/tables.c 90.09% <0.00%> (-0.10%) ⬇️
python/tskit/tables.py 98.91% <0.00%> (+0.09%) ⬆️
python/_tskitmodule.c 92.36% <0.00%> (+0.88%) ⬆️
python/lwt_interface/example_c_module.c 75.47% <0.00%> (+2.74%) ⬆️
python/lwt_interface/tskit_lwt_interface.h 95.41% <0.00%> (+4.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe6121a...1efee21. Read the comment docs.

@jeromekelleher
Copy link
Member

Looks good @mufernando. I'm not sure there's much point in having time and time_interval though - how about we have a single parameter time, and

  • If time is a numeric value, return all samples whose node time is approximately equal (as determined by numpy.isclose) to the specified value
  • Otherwise, if time is a pair of values (min_time, max_time) and we return all samples whose node time t is in this interval such that min_time <= t < max_time.

That seems straightforward enough?

@mufernando
Copy link
Member Author

Looks good @mufernando. I'm not sure there's much point in having time and time_interval though - how about we have a single parameter time, and

  • If time is a numeric value, return all samples whose node time is approximately equal (as determined by numpy.isclose) to the specified value
  • Otherwise, if time is a pair of values (min_time, max_time) and we return all samples whose node time t is in this interval such that min_time <= t < max_time.

That seems straightforward enough?

Done!

waiting for a review and then I can squash and rebase.

@petrelharp
Copy link
Contributor

LGTM! I added a few more tests, including one where we build a simple explicit tree sequence.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor changes in how we handle the type polymorphism suggested.

We should add some simple tests to make sure that numpy arrays are supported as input. I.e, something like

x = np.array([1, 2])
ts.samples(time=x[0]) # time == 1
ts.samples(time=x) # time between 1 and 2

@mufernando
Copy link
Member Author

Updated to work with numpy arrays (including 0d and (1,) arrays). Added some tests with these types of array also.

@mufernando mufernando force-pushed the time-slice-samples branch 2 times, most recently from ce9fc27 to cf8829e Compare September 17, 2021 15:53
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @mufernando. One final comment about using parametrize in tests rather than loops to get better visibility on test examples.

@mufernando
Copy link
Member Author

I think we are ready to merge!

Thanks you all!

@petrelharp
Copy link
Contributor

Sorry, one more thing! (see comments)

@mufernando
Copy link
Member Author

Fixed it!

@jeromekelleher jeromekelleher added the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Sep 17, 2021
@mergify mergify bot merged commit 098867a into tskit-dev:main Sep 17, 2021
@mergify mergify bot removed the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Sep 17, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

time slice in samples
3 participants