Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Mixed effects model to convert irregular data to basis expansion #618

Open
wants to merge 55 commits into
base: develop
Choose a base branch
from

Conversation

pcuestas
Copy link
Contributor

@pcuestas pcuestas commented Jun 12, 2024

References to issues or other PRs

Describe the proposed changes

1. Mixed-effects model

Implementation of the mixed effects method for converting irregularly sampled functional data to basis expansion.

This implementation includes the addition of a new module: skfda.representation.conversion, which is meant for including classes (transformers) that convert between different FData sublasses.

For this type of conversion (FDataIrregular to FDataBasis) two classes have been added: EMMixedEffectsConverter and MinimizeMixedEffectsConverter, each of which implements the conversion with a different method to fit the MLE of the mixed-effects model.

The EMMixedEffectsConverter uses the EM algorithm and the MinimizeMixedEffectsConverter uses generic optimizers from scipy.optimize.minimize to find maxima of the loglikelihood function.

2. Create an FDataIrregular object from another FData

A function: irregular_sample has been added to skfda.datasets, which creates an irregular sample by randomly selecting points from another FData object.

Additional information

All implementation added has been adapted (and tested) for multidimensional domains and codomains.

However, some problems have been found regarding this topic and are described in #616 and #617 .

Checklist before requesting a review

  • I have performed a self-review of my code
  • The code conforms to the style used in this package
  • The code is fully documented and typed (type-checked with Mypy)
  • I have added thorough tests for the new/changed functionality

pcuestas added 30 commits March 4, 2024 19:46
@pcuestas pcuestas marked this pull request as ready for review June 14, 2024 14:29
@pcuestas pcuestas requested a review from vnmabus June 14, 2024 14:29
Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pcuestas, I hope everything is ok. First, I have to apologize for taking so much time in reviewing your work in depth. In general it is really good, but I have found some minor things that could be improved.

Of course, I would totally understand if you do not have the time or will to review this after so much time. In that case, please just tell me and I will try to do the changes myself and merge it.

)
def test_irregular_sample(
fdata_fixture: str, request: Any
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring explaining the test is missing here.

n_points_range: Tuple[int],
random_state: np.random.RandomState,
) -> FDataIrregular:
"""Generate samples of functions at random points with gaussian noise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Generate samples of functions at random points with gaussian noise.
"""Generate samples of functions at random points with Gaussian noise.



def _linalg_solve(
a: NDArrayFloat, b: NDArrayFloat, *, assume_a: str = 'gen'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One parameter per line:

Suggested change
a: NDArrayFloat, b: NDArrayFloat, *, assume_a: str = 'gen'
a: NDArrayFloat,
b: NDArrayFloat,
*,
assume_a: str = "gen",

try:
return scipy.linalg.solve(a=a, b=b, assume_a=assume_a) # type: ignore
except scipy.linalg.LinAlgError:
# TODO: is the best way to handle this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have something in https://github.com/GAA-UAM/scikit-fda/blob/develop/skfda/misc/lstsq.py that could probably be reused.

) # type: ignore


class _MixedEffectsParams(Protocol):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a protocol with just one implementer?

def objective_function(params_vec: NDArrayFloat) -> float:
return - model.profile_loglikelihood(
params=MinimizeMixedEffectsConverter.Params.from_vec(
params_vec, dim_effects, model=self, has_mean=has_mean,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One param per line:

Suggested change
params_vec, dim_effects, model=self, has_mean=has_mean,
params_vec,
dim_effects,
model=self,
has_mean=has_mean,

X: FDataIrregular,
) -> FDataBasis:
"""Transform X to FDataBasis using the fitted converter."""
if self.result is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a scikit-learn method for that: sklearn.utils.validation.check_is_fitted

covariance=params.covariance,
sigmasq=params.sigmasq,
)
self.result = Bunch(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters set on fit must end in underscore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not convinced that we want to expose this as a Bunch object.

)
)
for basis_eval, Sigma, random_effect in zip(
model.basis_evaluations, values_cov, random_effects,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One parameter per line:

Suggested change
model.basis_evaluations, values_cov, random_effects,
model.basis_evaluations,
values_cov,
random_effects,

basis = FourierBasis(n_basis=5, domain_range=(0, 10))
model = _MixedEffectsModel(fdatairregular, basis)

# These values have been obtained with Statsmodels' MixedLM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have been good to have in the docstring the lines of code that generate these data, if possible.

@pcuestas
Copy link
Contributor Author

pcuestas commented Mar 9, 2025

Hello @vnmabus, I also hope you are doing great. Thanks for the review. I am very occupied with the master's until March 29th. I can do the changes then if waiting for a couple of weeks is OK.

@vnmabus
Copy link
Member

vnmabus commented Mar 9, 2025

I am glad that you are doing well! Of course it is ok, take the time you need. In case you change your mind, again just tell me.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants