Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Re-use part cartographer improvements/de# KeplerMapper #47

Open
pablodecm opened this issue Dec 21, 2017 · 3 comments
Open

Re-use part cartographer improvements/de# KeplerMapper #47

pablodecm opened this issue Dec 21, 2017 · 3 comments

Comments

@pablodecm
Copy link

pablodecm commented Dec 21, 2017

I have seen that this project has gone through considerable developments and improvements in the last months. About one year ago, I spent about 2 months studying TDA and its applications to (moistly scientific) data analysis. When it was time to review MAPPER, I started playing with KeplerMapper and I found it extremely convenient for MAPPER-based data exploration (basically was the only good open implementation out there).

However, I was missing some interactivity (changing the variable used for node colouring, recomputing the simplicial complex with other clustering/coverer parameters, etc.) for the exploration of the output simplicial complex. With the aim of understanding better MAPPER I started rewriting my own implementation (which I called cartographer) re-using scikit-learn as much as possible, inspired with what was done in KeplerMapper.

Given that this project is now undergoing active development now and it is definitely more mature and has more user adoption than mine, I think it would be interesting to see wether some of the simplified API design choices and implementation changes could be ported to KeplerMapper to improve usability and performance in case you are interested.

So we can discuss what could be re-used within KeplerMapper, I list now the main design and implementation changes when rewriting the implementation (it is pretty simple and can bee seen here):

  1. Opted for separated visualisation of the simplicial complex from the actual computation of the simplicial complex (the scikit-learn cluster-like model). My aim was to be able to adapt the visualisation details at a later stage and also have the possibility to either serve a standalone html or also see the visualisation within a Jupyter Notebook/Lab.

  2. The Mapper class inherits from ClusterMixin and the three Mapper components can be configured in the constructor call: filterer (the transforming function reducing from the high-dimensionality of the data of from a lower dimensionality to compute the nerve), coverer (the transformer function that defines the overlapping spaces from which the nerve is computed) and clusterer (the clustering algoritm that is actually used in the algorithm).

  3. The coverer, see the HyperRectangleCoverer divides the input space in overlapping regions. One trick that I discovered to speed up considerably the execution was to reduce set intersection checks to overlapping regions, by means of returning with the coverer a overlapping matrix (could be also sparse) and checking for intersection only on those subsets.

  4. Standalone documentation with Jupyter notebooks (the output D3.js graph can be explored within a Jupyter output cell ), executed with nbsphinx when the docs are built by the CI.

In addition to the features I implemented in cartographer, I also spent a few weeks thinking on how to implement other improvements (e.g. bidirectional Jupyter widget visualization, how to deal with hyper-parameters, multi-scale MAPPER approaches). I will also be glad to discuss also those and contribute to their integration.

@sauln
Copy link
Member

sauln commented Dec 22, 2017

This is great! I think cartographer has a lot of great stuff going on, and merging the two libraries would benefit everyone!

I think 3 and 4 are the easiest and most straight forward to incorporate. The cover API is still new, so now is the time to make large changes. Also, we’ve only just put together a readthedocs page, so rolling in a bunch of the docs you’ve but together would be a boon.

I can put work in this next week to help integrate those changes. I’m not sure the best way to go about doing this while making sure you get credit, if you care about that. Once they are in, I propose we increment the version to 1.1.

I’m really interested in how you were able to convert notebooks to Sphinx docs. Right now the km docs pages are just the readme cut into smaller pieces, and hasn’t been officially released. Having a stack of notebooks would be great as both introduction to km and mapper in general.

I am interested in your thoughts on other improvements. Some of them will be much more of an undertaking to incorporate. Let’s discuss the rest of the features and ideas in more depth.

@pablodecm
Copy link
Author

pablodecm commented Dec 26, 2017

Sorry for the late reply, was a bit preoccupied with family-related celebrations. Indeed, I agree that 3 and 4 are the simplest improvements.

I can try to open a PR myself with some proposed coverer changes taken from cartographer or alternatively you can integrate the changes if you prefer. I am guessing that KepplerMapper developments occur in the dev, so it should be the PR target, am I right?

Regarding the Jupyter Notebook based examples in the docs, I used the nbsphinx module and the notebooks are executed by the CI system. I was using RTD but it was not compatible with nbsphinx at the time so I moved to Travis and GitHub pages. You can see my Sphinx conf here. You can find more info about nbsphinx in their GitHub.

About credit for my prospective contributions or reusing of design/code/docs from cartographer, it depends on how much I get involved in the project. For the time being, let's try to see how well we can integrate stuff from cartographer in KeplerMapper. Later on, if you are agree and my contributions are significant we can opt for some way of "official" recognition for the developments.

@sauln
Copy link
Member

sauln commented Dec 28, 2017

No worries. I think everyone is busy this time of year.

I would prefer you initiate the first PR so at the very least you show up in the contributors tab. I'll have lots of time next week to work on this integration and some other features I wanted to build out.

We started trying to keep recent developments in the dev branch so that docs & master match what is in pypi. This is a pretty new setup though, so we would listen to any suggestions for other ways to improve this.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants