Skip to content

Implement total_branch_length using numpy ops? #1794

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
jeromekelleher opened this issue Oct 13, 2021 · 1 comment
Closed

Implement total_branch_length using numpy ops? #1794

jeromekelleher opened this issue Oct 13, 2021 · 1 comment
Labels
Performance This issue addresses performance, either runtime or memory Python API Issue is about the Python API

Comments

@jeromekelleher
Copy link
Member

After #1704 lands we can implement total_branch_length like this:

@property
 def total_branch_length(self):
        nodes = self.preorder()
        parent = self.parent_array[nodes]
        time = self.tree_sequence.tables.nodes.time
        branch_length = time[parent] - time[nodes]
        return float(np.sum(branch_length[parent != NULL]))

I'm sure this'll be faster most of the time, but I guess there could be some cases where overhead of creating the time array leads to a regression. There'll be a bit of memory overhead too, but I doubt this is significant.

Any thoughts on whether we should/shouldn't do things like this?

(In retrospect this should have been a function rather than a property, as it would have been more useful (you could specify the root node you wanted to sum from), and it also violates the "properties are inexpensive" rule. I think my reasoning at the time was that this was something we should be able to keep track of efficiently incrementally, but this isn't actually true under the current definition, which only sums branch length reachable from roots. That ship has sailed anyway, but I guess we should make a note of this in the comments if/when we're updating the method)

@jeromekelleher jeromekelleher added Python API Issue is about the Python API Performance This issue addresses performance, either runtime or memory labels Oct 13, 2021
@jeromekelleher
Copy link
Member Author

Although, on second thoughts, we can trivially implement this now in C using the preorder function and there's no downside there, so it would be quicker to just do this than to try to benchmark the pros and cons of the numpy version.

This could be a good example to use for illustrating tree operations using numpy, though (#1788)

jeromekelleher added a commit to jeromekelleher/tskit that referenced this issue Oct 16, 2021
Closes tskit-dev#1794

Benchmarks:

Before:
big_tree         0.26075993799986463
many_trees       0.002319710470019345

After:
big_tree         0.004731306999929075
many_trees       2.1276080024108523e-05

So 2-3 orders of magnitude (probably more compared to previous release,
since this using faster node iteration)
jeromekelleher added a commit to jeromekelleher/tskit that referenced this issue Oct 16, 2021
Closes tskit-dev#1794

Benchmarks:

Before:
big_tree         0.26075993799986463
many_trees       0.002319710470019345

After:
big_tree         0.004731306999929075
many_trees       2.1276080024108523e-05

So 2-3 orders of magnitude (probably more compared to previous release,
since this using faster node iteration)
jeromekelleher added a commit to jeromekelleher/tskit that referenced this issue Oct 18, 2021
Closes tskit-dev#1794

Benchmarks:

Before:
big_tree         0.26075993799986463
many_trees       0.002319710470019345

After:
big_tree         0.004731306999929075
many_trees       2.1276080024108523e-05

So 2-3 orders of magnitude (probably more compared to previous release,
since this using faster node iteration)
@mergify mergify bot closed this as completed in 02d4771 Oct 20, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Performance This issue addresses performance, either runtime or memory Python API Issue is about the Python API
Projects
None yet
Development

No branches or pull requests

1 participant