You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What are identifiability constraints and why apply them
Identifiability constraints are techniques to ensure that a design matrix has maximal rank (i.e., the rank is the minimum between the number of rows and columns) by removing redundant dimensions.
In statistical models like GLMs, identifiability constraints ensure unique parameter solutions by removing redundant features. Without these constraints, multiple parameter vectors could produce identical outcomes, complicating model interpretation and optimization
Mathematically, if $X\in \mathbb{R}^{n \times m}$ is a design matrix, and $\text{rank}(X) < m\leq n$, there exists at least a $v\in\mathbb{R}^m$ such that $X \cdot v = 0$. This implies that a weight vector $$w \in \mathbb{R}^m$$, that maximizes the model likelihood, $w + \alpha v$ also maximizes the likelihood for every $\alpha \in \mathbb{R}$.
This shows that any weight vector in the form $w + \alpha v$ will result in the same likelihood, leading to non-unique solutions.
How are the constraint applied in NeMoS
The way we currently apply those constraint is by throwing away columns of until full rank is obtained. This happens within basis when a flag is set at initialization.
What's the problem with this approach
The resulting behavior of the basis transform may be confusing and cause errors if used in pipelines:
Calling compute_features on different inputs may result in a different number of output features, since the rank of the output design matrix depends on the input.
Basis methods for splitting the feature axis of composite basis in individual components -currently work in progress-, rely on the number of output features to be consistent and may fail.
Regularized models are always full rank, but basis is unaware of what model is applied after the features are constructed, and would still apply the constraint to the design matrix.
Solution
It is still valuable to provide a function to enforce identifiablity constraints, but should probably be implemented in a dedicated utility function, that should be called after basis.compute_features and before glm.fit.
This will allow a consistent behavior of basis, and will make more explicit when the user is applying the constraint.
Additional Note
The identifiability constraint changes how parameter should be interpreted, we need to document that.
Imagine a 1-hot encoded trial structure, with 4 trials, the first two of type A, the second of type B. The encoding matrix would be
[[1., 0.],
[1., 0.],
[0., 1.],
[0., 1.]]
Fitting a GLM with this design would return two weights, the first can be interpreted as the effect of trial type A on the rate and the second that of trial type B.
Adding an intercept would make it low rank, since it is equivalent to this design
Applying an identifiability constraint by throwing away the last column result in the following
[[1., 1.],
[1., 1.],
[1., 0.],
[1., 0.]]
And now the interpretation of the weights changes: the first weight is the baseline rate, the second weight is the delta between the baseline rate and the trial type A rate.
The trial rates are fully captured by the model because if one wants to predict the rate at trial type A, must use pass the following array [[1, 1]] as input to predict, for trial type B, [[1, -1]], what changes is how one interprets the weights.
This discussion that may be bread and butter to statisticians, may that obvious to any neuroscientist, and should be highlighted somewhere in the docs.
The text was updated successfully, but these errors were encountered:
What are identifiability constraints and why apply them
Identifiability constraints are techniques to ensure that a design matrix has maximal rank (i.e., the rank is the minimum between the number of rows and columns) by removing redundant dimensions.
In statistical models like GLMs, identifiability constraints ensure unique parameter solutions by removing redundant features. Without these constraints, multiple parameter vectors could produce identical outcomes, complicating model interpretation and optimization
Mathematically, if$X\in \mathbb{R}^{n \times m}$ is a design matrix, and $\text{rank}(X) < m\leq n$ , there exists at least a $v\in\mathbb{R}^m$ such that $X \cdot v = 0$ . This implies that a weight vector $$w \in \mathbb{R}^m$$ , that maximizes the model likelihood, $w + \alpha v$ also maximizes the likelihood for every $\alpha \in \mathbb{R}$ .
This shows that any weight vector in the form$w + \alpha v$ will result in the same likelihood, leading to non-unique solutions.
How are the constraint applied in NeMoS
The way we currently apply those constraint is by throwing away columns of until full rank is obtained. This happens within basis when a flag is set at initialization.
What's the problem with this approach
The resulting behavior of the basis transform may be confusing and cause errors if used in pipelines:
compute_features
on different inputs may result in a different number of output features, since the rank of the output design matrix depends on the input.Solution
It is still valuable to provide a function to enforce identifiablity constraints, but should probably be implemented in a dedicated utility function, that should be called after
basis.compute_features
and beforeglm.fit
.This will allow a consistent behavior of basis, and will make more explicit when the user is applying the constraint.
Additional Note
The identifiability constraint changes how parameter should be interpreted, we need to document that.
Imagine a 1-hot encoded trial structure, with 4 trials, the first two of type A, the second of type B. The encoding matrix would be
Fitting a GLM with this design would return two weights, the first can be interpreted as the effect of trial type A on the rate and the second that of trial type B.
Adding an intercept would make it low rank, since it is equivalent to this design
and the columns are linearly dependent.
Applying an identifiability constraint by throwing away the last column result in the following
And now the interpretation of the weights changes: the first weight is the baseline rate, the second weight is the delta between the baseline rate and the trial type A rate.
The trial rates are fully captured by the model because if one wants to predict the rate at trial type A, must use pass the following array
[[1, 1]]
as input to predict, for trial type B,[[1, -1]]
, what changes is how one interprets the weights.This discussion that may be bread and butter to statisticians, may that obvious to any neuroscientist, and should be highlighted somewhere in the docs.
The text was updated successfully, but these errors were encountered: