Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

New format for persistence #925

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

mglisse
Copy link
Member

@mglisse mglisse commented Jul 28, 2023

(not ready: missing tests at least, and doing the same for cubical)
As a step towards using more numpy arrays, this provides an option to SimplexTree.persistence() to output a list of arrays (1 numpy array per dimension), instead of our current list of tuple (all together, with the dimension being part of the tuple).
I open the PR to start the discussion.

return self.pcohptr.get_persistence()
if output_type == 'array by dimension':
v = self.pcohptr.intervals_by_dimension()
return [ np.asarray(dgm) for dgm in v ] # std::move(dgm)?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do the usual workaround for empty dgm to get a shape (0,2).

@VincentRouvreau
Copy link
Contributor

(this PR requires to merge master for compilation to pass)

import gudhi
rips_complex = gudhi.RipsComplex(points=[[1, 1], [7, 0], [4, 6], [9, 6], [0, 14], [2, 19], [9, 17]],
                                 max_edge_length=12.0)

simplex_tree = rips_complex.create_simplex_tree(max_dimension=1)
simplex_tree.persistence(output_type='old')
# [(0, (0.0, inf)), (0, (0.0, 8.94427190999916)), (0, (0.0, 7.280109889280518)), (0, (0.0, 6.082762530298219)),
#  (0, (0.0, 5.830951894845301)), (0, (0.0, 5.385164807134504)), (0, (0.0, 5.0))]
simplex_tree.persistence(output_type='array by dimension')
# [array([[0.        , 5.        ],
#        [0.        , 5.38516481],
#        [0.        , 5.83095189],
#        [0.        , 6.08276253],
#        [0.        , 7.28010989],
#        [0.        , 8.94427191],
#        [0.        ,        inf]])]

This format is quite interesting and was initiated from discussion on #395

@martinroyer
Copy link
Collaborator

I don't have much to add except that I am greatly in favor of this and already using the new format using the following simple piece of translator code, if it is any help to anyone / anywhere:

def _diag_to_list_by_dim_format(pdiagram, tda_max_dim=None):
    """ transforms list of Tuple(dimension, Tuple(x,y)) into list of ndarray[[x_0, y_0], [x_1, y_1], ...]
     for each dimension, so if Tuple(x,y) has dimension i it will be found in the ith ndarray of
     the resulting list, e.g.:
         [(2, (1.0577792405537423, 1.1003878733068035)),
          (0, (0.0, np.inf)),
          (0, (0.0, 1.0556057201636535)),
          (0, (0.0, 0.8756047102452433))]
     transforms into
         [array([[0.        ,        inf],
                 [0.        , 1.05560572],
                 [0.        , 0.87560471]]),
         array([], shape=(0, 2), dtype=float64),
         array([[1.05777924, 1.10038787]])].
    """
    max_dimension = max(dim for (dim, _) in pdiagram) if tda_max_dim is None else tda_max_dim
    by_dim_pdiagram = [np.array([_ for (dim, _) in pdiagram if dim == i]) for i in range(0, max_dimension + 1)]
    return [diag if len(diag) else np.empty(shape=[0, 2]) for diag in by_dim_pdiagram]

I also suggested future tests for diagram vectorizers in the new format there: https://github.com/GUDHI/gudhi-devel/pull/1017/files#diff-509c3dbe85dd6b515d6d3e33d0f6c074686dcb27367baa653eeedeeec534f1c8. As Marc hinted it allows for nice interactions with sklearn.compose.ColumnTransformer for instance.

:returns: The persistence of the simplicial complex.
:rtype: list of pairs(dimension, pair(birth, death))
"""
self.compute_persistence(homology_coeff_field, min_persistence, persistence_dim_max)
return self.pcohptr.get_persistence()
if output_type == 'array by dimension':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"array by dimension" seems a bit long to me, is "arrays" ok ? and 'old' isn't very telling : "tuples" ?
Also, should we raise a DeprecationWarning to change the default to the array one afterward ?

return self.pcohptr.get_persistence()
if output_type == 'array by dimension':
v = self.pcohptr.intervals_by_dimension()
return [ np.asarray(dgm) for dgm in v ] # std::move(dgm)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuples are slightly faster than lists in python, but that's negligible here.

@@ -612,7 +612,7 @@ cdef class SimplexTree:
"""
self.get_ptr().expansion_with_blockers_callback(max_dim, callback, <void*>blocker_func)

def persistence(self, homology_coeff_field=11, min_persistence=0, persistence_dim_max = False):
def persistence(self, homology_coeff_field=11, min_persistence=0, persistence_dim_max = False, output_type = 'old'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- Not completely related to this PR, but I think we should do type annotations here, i.e.,

from typing import Optional,Literal, Union
def persistence(self, homology_coeff_field:int=11, min_persistence:float=0., persistence_dim_max:Optional[int] = None, output_type:Literal['old','array by dimension'] = 'old') -> list:
    pass

especially to have auto-completion for long arguments such as "array by dimension".
I also changed the False to a None.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants