Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add ClusterArray class to import any kind of microstates maps #118

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

RunKZhang
Copy link

@RunKZhang RunKZhang commented Aug 2, 2023

Hello, this is the new feature I described in Issues, and I open a pull requests here. This is a draft of the thoughts, and not heavily tested.

Closes #117

@mscheltienne
Copy link
Collaborator

mscheltienne commented Aug 2, 2023

I think that addition to the API would be very welcome. My initial impression is that it should mimic the ClassArray API from MNE-Python. Examples:

In all 5 cases, the class signature is ClassArray(data, info, optional_arguments). Thus, I would define a ClusterArray class which would ingest fitted data.

class ClusterArray(_BaseCluster):

    def __init__(
        data: NDArray[float], 
        info: Union[Info, ChInfo], 
        cluster_names: Optional[List[str]] = None, 
        fitted_data: Optional[NDArray[float]] = None,
        labels: Optional[NDArray[int]] = None,
    ):

Where data would be the cluster_centers_ (named data to match MNE-Python's API). The number of clusters n_clusters can be inferred from data. Note that I've set cluster_names, fitted_data, and labels as Optional. I don't think they are key elements that we require to run other methods/functions from the pycrostates API. They are "nice-to-have" variables, but not mandatory. I could be wrong however, I would have to check (especially for labels).

WDYT @RunKZhang ?

@mscheltienne
Copy link
Collaborator

@RunKZhang I fixed my proposed class signature in the previous post, I defined it as a function for some reason 🙈

@RunKZhang
Copy link
Author

I agree with you for the ClassArray, and data could be cluster_centers_. But I think labels should be mandatory, since it is essential to calculate gev_.

@RunKZhang
Copy link
Author

Sorry I don't get why define it as a function. Since the predict method helps make segmentation of the raw object, and it is inherited from _BaseCluster, I think define it as a class is preferable.

@mscheltienne
Copy link
Collaborator

Sorry I don't get why define it as a function. Since the predict method helps make segmentation of the raw object, and it is inherited from _BaseCluster, I think define it as a class is preferable.

Definitely! I just made a mistake in my original comment. prior to edit, where I wrote:

class ClusterArray(
    data: NDArray[float], 
    info: Union[Info, ChInfo], 
    cluster_names: Optional[List[str]] = None, 
    fitted_data: Optional[NDArray[float]] = None,
    labels: Optional[NDArray[int]] = None,
):

which was obviously wrong!

@mscheltienne
Copy link
Collaborator

But I think labels should be mandatory, since it is essential to calculate gev_

OK, then let's put it as third argument, after info.
@vferat Sounds good to you?

@vferat
Copy link
Owner

vferat commented Aug 6, 2023

Hey @RunKZhang, thanks for your contribution !

In my opinion, fitted_data and labels should be optional, as it can happen that only cluster centers (i.e. microstates maps) are shared from one study to another.

  • fitted_data can be defined if labels is not defined (for example, if the maps come from a non-clustering algorithm such as ICA, then there are no labels).
  • labels must not be defined if fitted_data is not defined.
  • If both fitted_data and labels are defined, then gev_ should be calculated automatically.
  • If neither fitted_data nor labels are defined, an specific error should be raise when using clustering metric.
  • We should add another argument ignore_polarity to know if the imported maps were computed/are intended to be used with or without polarity (i.e. Add support for ERP analysis: ignore_polarity = False #93)
  • We need to make sure that saving in fif format takes these different possibilities into account.

What do you think about it ?

Otherwise, the rest of your discussion seems good.

@RunKZhang
Copy link
Author

Hello @vferat , I agree with your idea!

As you have mentioned, there are no labels if the cluster_centers are from ICA or other algorithms, and I have to admit that your design of this feature is thoughtful. For the remaining design, I have no objection to them.

Finally, I am grateful my suggestions is taken into consideration. Thank you guys so much!

@mscheltienne
Copy link
Collaborator

mscheltienne commented Aug 7, 2023

OK, then let's go with this class signature:

class ClusterArray(_BaseCluster):

    def __init__(
        data: NDArray[float], 
        info: Union[Info, ChInfo], 
        cluster_names: Optional[List[str]] = None, 
        fitted_data: Optional[NDArray[float]] = None,
        labels: Optional[NDArray[int]] = None,
        ignore_polarity: bool = False,
    ):

I think ignore_polarity=False is the sensible default, @vferat?

@vferat
Copy link
Owner

vferat commented Aug 8, 2023

We touch a sensitive point here:

  • We do not support .predict() for ignore_polarity=False (yet?). So ClusterArray(ignore_polarity=False) would be of no use.
  • ignore_polarity is a critical parameter that may completely change the analysis. I would rather ensure the user has set it on purpose, than relying on an optional parameter.

The solution I see is to set ignore_polarity as a mandatory argument, and throw a NotImplementedError error if someone tries to set ClusterArray(ignore_polarity=False) for now.

@mscheltienne
Copy link
Collaborator

OK, that works for me.

@mscheltienne mscheltienne changed the title Add some codes for Custom class Add ClusterArray class to import any kind of microstates maps Mar 21, 2024
@mscheltienne
Copy link
Collaborator

mscheltienne commented Mar 21, 2024

I picked up this PR, and started with (1) clean-up, (2) input validation on the class, (3) I/O roundtrip to FIFF.
No test for now, but that should be a good basis. @vferat Can you have a first look to the main class and to what could be missing. Mostly, could we compute fitting variables from the information provided with this signature, e.g. GEV?

Repository owner deleted a comment from codecov bot Mar 21, 2024
Copy link
Owner

@vferat vferat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fitted_data and labels are not mandatory.

  • If both fitted_data and labels are provided, then it is possible to compute GEV_.
  • If neither fitted_data nor labels are provided, then it is not possible to use pycrostates.metrics.

I don't see any other parameter that might need specific handling

@mscheltienne
Copy link
Collaborator

But if we have fitted_data, we should be able to compute the labels? So in the end, we compute the GEV in all cases except if fitted_data is not provided?

@vferat
Copy link
Owner

vferat commented Mar 27, 2024

But if we have fitted_data, we should be able to compute the labels? So in the end, we compute the GEV in all cases except if fitted_data is not provided?

It could be, but there are some cases where the process is not straight forward:

  • some clustering algorithms assign sample to a cluster even if the distance between the sample and the choosen cluster center is the smallest ( for example a clustering algorithm using another distance definition, or a clsuter algorithm taking into account other parameters such as sample order..)
  • some algorithms can extract maps without labels (i.e. ICA).

These special cases make it difficult to decide whether labels should be automatically calculated. I'm open to discussion, as I have no arguments either for or against.

@mscheltienne
Copy link
Collaborator

Sounds like labels should not be auto-calculated.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a Custom class that is same as ModKMeans
3 participants