This repository is forked from interval-matching, the code for the "Persistent Cohomological Cycle Matching" approach developed in the paper "Fast Topological Signal Identification and Persistent Cohomological Cycle Matching" (García-Redondo, Monod, and Song 2022). The original codes were co-written by Inés García-Redondo and Anna Song, performing cycle matching [1] while incorporating the computational advantages given by Ripser [2] and Ripser-image [3].
This fork is adapted to perform the analysis in [5] and is also included as a submodule of the corresponding brain_representations repository.
This repository is organised as follows. Each module listed below also has its own README(s) with more detailed information in corresponding directory.
Prepare Ripser input, perform post-Ripser interval matching, and compute immediate derivative values from match data.
Visualize persistence and cycle-match output data, define and calculate quantities of post-hoc analysis.
SLURM scripting at problem scale, including calls to Ripser.
Ripser, but augmented with a line of code extracting its inbuilt lexicographical filtration refinement.
This fork of interval-matching contributes new functionality and makes significant changes to the code structure.
- generalized to accept arbitrary (pre-computed) distance metrics in the bootstrapping case (see
create_ldm_images.py
) - additional utilities for cycle registration over data bootstraps (see
create_ldm_images.py
andgenerate_tag_subindex.py
inutils_match
) - integration of cycle matching with statistical permutation testing (see
comp_permtest_dists.py
,bootstrap_dists.py
,submit_(bsdist|permdist)_sbatch.sh
, and(permtest|bootstrap)_distances.sh
invisualization
) - new modules for computation of Wasserstein variants (see
diagram_distances.py
invisualization
) - new visualizations of aggregated statistcs and distributions of bootstrapped cycles (see, e.g.,
compare_topostats.py
anddistributional_summaries.py
invisualization
) - new toy examples (and corresponding manipulations) for validation and exploration (see
toy_models.py
invisualization
)
- split cycle-matching implementation into modular structure (see
compute.py
,extract.py
, andmatch.py
inutils_match
) - re-factored script-style code into Pythonic functional programming style
- optimized for integration with high-performance computing (HPC) environment (including bash scripts with example calls: see
submit_*.sh
scripts inslurm_bash
) - re-organized directory structure
C++ is a general-purpose programming language which has object-oriented, generic, and functional features in addition to facilities for low-level memory manipulation. It is the language chosen for the codes Ripser [2] and Ripser-image [3]. The C++ files for those, with a slight modification needed to implement cycle matching, can be found in the folder modified ripser
.
Before running the code to perform cycle matching in this repository, one needs to compile the C++ files in the modified ripser
folder. For that
- Install a C++ compiler in your computer. We recommend getting the compiler GCC.
- For Linux: the default Ubuntu repositories contain a meta-package named build-essential that contains the GCC compiler and a lot of libraries and other utilities required for compiling software. You only need to run
sudo apt install build-essential
in a terminal to install it. - For Windows: you can install Mingw-w64 which supports the GCC compiler on Windows systems. You can get this through the installation packages for MSYS2.
- For MacOS: see this link to install GCC.
- For Linux: the default Ubuntu repositories contain a meta-package named build-essential that contains the GCC compiler and a lot of libraries and other utilities required for compiling software. You only need to run
- The relevant Makefiles are included in the corresponding folders, so the compilation can be done by running the command line
make
in a terminal opened in the folder. - The compiled files should be in the same directory than the python scripts/notebooks in which the cycle matching code is invoked.
This repository uses a few packages beyond python built-ins, and most of these are only necessary for post-hoc analysis code in visualization
. We recommend managing this package's environment and dependencies through Anaconda/miniconda and conda-forge. Dependencies are listed below and grouped by module.
utils_match
visualization
- POT (python optimal transport library)
- OpenCV {optional alternative to POT as Wasserstein backend}
- numpy
- scipy
- pandas
- seaborn
- matplotlib
Note that the OpenCV package is an optional alternative to POT as the backend for Wasserstein computations; the code importing this package can be disabled without affecting overall functionality. In our problem regime, we found POT to have performance advantages over OpenCV, hence its status as the default option.
This code is available and is fully adaptable for individual user customization. If you use these methods, please cite the following works:
@misc{garcia-redondo_fast_2022,
title = {Fast {Topological} {Signal} {Identification} and {Persistent} {Cohomological} {Cycle} {Matching}},
url = {http://arxiv.org/abs/2209.15446},
urldate = {2022-10-03},
publisher = {arXiv},
author = {García-Redondo, Inés and Monod, Anthea and Song, Anna},
month = sep,
year = {2022},
note = {arXiv:2209.15446 [math, stat]},
keywords = {Mathematics - Algebraic Topology, Statistics - Machine Learning},
}
@misc{easley2023comparingrepresentationshighdimensionaldata,
title={Comparing representations of high-dimensional data with persistent homology: a case study in neuroimaging},
author={Ty Easley and Kevin Freese and Elizabeth Munch and Janine Bijsterbosch},
year={2023},
eprint={2306.13802},
archivePrefix={arXiv},
primaryClass={cs.CG},
url={https://arxiv.org/abs/2306.13802},
}
[1] Reani, Yohai, and Omer Bobrowski. 2021. ‘Cycle Registration in Persistent Homology with Applications in Topological Bootstrap’, January. https://arxiv.org/abs/2101.00698v1.
[2] Bauer, Ulrich. 2021. ‘Ripser: Efficient Computation of Vietoris-Rips Persistence Barcodes’. Journal of Applied and Computational Topology 5 (3): 391–423. https://doi.org/10.1007/s41468-021-00071-5.
[3] Bauer, Ulrich, and Maximilian Schmahl. 2022. ‘Efficient Computation of Image Persistence’. ArXiv:2201.04170 [Cs, Math], January. http://arxiv.org/abs/2201.04170.
[4] Čufar, Matija, and Žiga Virk. 2021. ‘Fast Computation of Persistent Homology Representatives with Involuted Persistent Homology’. ArXiv:2105.03629 [Math], May. http://arxiv.org/abs/2105.03629.
[5] T. Easley, K. Freese, E. Munch, and J. Bijsterbosch, “Comparing representations of high-dimensional data with persistent homology: a case study in neuroimaging,” Nov. 23, 2023, arXiv: arXiv:2306.13802. doi: 10.48550/arXiv.2306.13802.