Start facilitating coefficient splitting #247

BalzaniEdoardo · 2024-10-09T22:19:01Z

Summary

This PR starts the process of facilitating slicing the feature axis of either the model coefficients, or of the design matrix.

In particular,

Added To `Basis`

label parameter: name the variable, write only. Label of AdditiveBasis and MultiplicativeBasis combines the labels of 1d basis.
n_basis_input parameter: name the variable, write only, the shape of axis 1 of a 2D input to basis. This happens in convolve, for example when we convolve the counts of a neural population.
_get_feature_slicing: a recursion that returns a dictionary of slices, that can be applied to the feature axis. Labels (or combination of labels) are used as keys.
split_by_feature: the user facing method that splits an array into feature components.
n_output_features: a read-only property that returns the number of output feature dimension.
_input_shape: stores the shape of the first input passed to compute features. Any subsequent input shape will be checked against this. Consistent input shape will guaranteed that we can split correctly a feature axis.

_get_feature_slicing methods are for internal use, but will make our life very easy. In split_feature_axis we could use a jax.tree_utils.tree_map to apply the slicing to any array, and get automatically a dictionary, well labeled in a meaningful way, containing the coefficients.

If arrays are numpy, the use of slice is very efficient, since it will create a dict of views of the array.

It will also facilitate creating a "FeaturePytree" from an input in array or TsdFrame form.

Example of `_get_feature_slicing`:

>>> import nemos as nmo
>>> import jax

>>> bas1 = nmo.basis.RaisedCosineBasisLinear(3, mode="conv", n_basis_input=2, window_size=5, label="position")

>>> bas2 = nmo.basis.MSplineBasis(4, mode="conv", n_basis_input=3, window_size=5,  label="velocity")

>>> bas3 = bas1 + bas2 + bas1 * bas2

>>> # slice each individual input, default behavior.
>>> slice_dict = bas3._get_feature_slicing()[0]  
>>> slice_dict
{'position': {'0': slice(0, 3, None), '1': slice(3, 6, None)},
 'velocity': {'0': slice(6, 10, None),
  '1': slice(10, 14, None),
  '2': slice(14, 18, None)},
 '(position * velocity)': slice(18, 90, None)}

>>> # slice each additive component instead.
>>> bas3._get_feature_slicing(split_by_input=False)[0] 
{'position': slice(0, 6, None),
 'velocity': slice(6, 18, None),
 '(position * velocity)': slice(18, 90, None)}

>>> # splitting a design matrix becomes trivial
>>> x1 = np.random.normal(size=(10, 2))
>>> x2 = np.random.normal(size=(10, 3))
>>> X = bas3.compute_features(x1, x2, x1, x2)
>>> splits = jax.tree_util.tree_map(lambda sl: X[:, sl], slice_dict)

Example of `split_by_feature`

>>> import numpy as np
>>> from nemos.basis import BSplineBasis
>>> from nemos.glm import GLM

>>> # Define an additive basis
>>> basis = (
...     BSplineBasis(n_basis_funcs=5, mode="conv", window_size=10, label="feature_1") +
...     BSplineBasis(n_basis_funcs=6, mode="conv", window_size=10, label="feature_2")
... )

>>> # split an arbitrarily shaped array
>>> basis.compute_features(np.random.randn(10,2), np.random.randn(10,3))
>>> array = np.ones((1, 1, basis.num_output_features, 1))
>>> basis.split_by_feature(array, axis=2)
{'feature_1': array([[[[[1.],
           [1.],
           [1.],
           [1.],
           [1.]],
 
          [[1.],
           [1.],
           [1.],
           [1.],
           [1.]]]]]),
 'feature_2': array([[[[[1.],
           [1.],
           [1.],
           [1.],
           [1.],
           [1.]],
 
          [[1.],
           [1.],
           [1.],
           [1.],
           [1.],
           [1.]],
 
          [[1.],
           [1.],
           [1.],
           [1.],
           [1.],
           [1.]]]]])}


>>> # example of usage in combination with GLM
>>> X = np.random.normal(size=(20, basis.num_output_features))
>>> y = np.random.poisson(size=(20, ))
>>> basis.split_by_feature(GLM().fit(X, y).coef_, axis=0)
{'feature_1': Array([-0.02247754,  0.49239248, -0.09706223, -0.30416837,  0.04843776],      dtype=float32),
 'feature_2': Array([-0.29889402, -0.0040512 , -0.28740323,  0.5222396 ,  0.55201346,
        -0.13157026], dtype=float32)}

Moved To `nemos.identifiability_constraint`

apply_identifiability_constraints: function that receives a matrix as input and drops columns until a minimal set of linearly independent columns is left.
apply_identifiability_constraints_by_basis_component: a method that assumes that each feature component is independent, and apply_identifiability_constraints to each feature. This doesn't guarantee full rank but can be way faster, especially when the number of features is large, and can be applied in sequence: first a pass by component, then a full pass to the design matrix. It has the advantage of being more stable numerically, since the dimensionality of each component can be much smaller than the overall design matrix dimensionality, making the singular value computation and thresholding easier.

[NOTE]
The use of parentheses guarantees that the label fully specifies the order of operation in a composite basis.

BalzaniEdoardo · 2024-10-10T16:35:46Z

src/nemos/basis.py

@@ -659,7 +692,9 @@ def _compute_features(self, *xi: ArrayLike) -> FeatureMatrix:
        Raises
        ------
        ValueError:
-            If an invalid mode is specified or necessary parameters for the chosen mode are missing.
+            - If an invalid mode is specified or necessary parameters for the chosen mode are missing.
+            - In mode "conv", if the number of inputs to be convolved, doesn't match the number of inputs


The init of mode="conv" and mode="eval" starts to diverge even more, and this is another argument in favor of having distinct classes for basis that performs convolution and basis for evaluation.

Separate PR I would say, since there is a lot to change if we include the test. Do you have suggestion on an API that may work and also naming conventions?

Most of the machinery is shared, so probably a base class which they both inherit from, with an abstract method compute_features, and the two variants simply re-implement that

Refer to #202 for a discussion on the basis API

codecov-commenter · 2024-10-10T16:55:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.27%. Comparing base (81b9dee) to head (bf87623).
Report is 46 commits behind head on development.

Additional details and impacted files

@@               Coverage Diff               @@
##           development     #247      +/-   ##
===============================================
+ Coverage        96.73%   97.27%   +0.53%     
===============================================
  Files               25       25              
  Lines             2206     2199       -7     
===============================================
+ Hits              2134     2139       +5     
+ Misses              72       60      -12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Sarah Jo Venditto <sjvenditto@gmail.com>

Co-authored-by: William F. Broderick <billbrod@gmail.com>

…nemos into coeff_parsing_support

Co-authored-by: William F. Broderick <billbrod@gmail.com>

…generalized-linear-models into coeff_parsing_support

…nemos into coeff_parsing_support

fixed abs links

BalzaniEdoardo · 2024-11-06T19:31:16Z

I'm confused by the behavior of split_by_feature:

* Why does it return 3d arrays? If I look at the example in the docstring, it returns arrays of shape (20,1,5) and (20,1,6). What's the middle dimension? This is also the behavior for a single basis or multiplicative basis.

Basis in conv mode allow for mutlidimensional inputs (intended for counts which are a TsdFrame in pynapple). The extra dimension. Example below,

>>> import nemos as nmo
>>> import pynapple as nap
>>> import numpy as np
>>> counts = nap.TsdFrame(t=np.arange(100), d=np.random.poisson(size=(100, 2)))
>>> basis = nmo.basis.BSplineBasis(5, mode="conv", window_size=50)
>>> X = basis.compute_features(counts)
>>> X.shape
(100, 10)
>>> basis.split_by_feature(X, axis=1)["BSplineBasis"].shape
(100, 2, 5)

* This method only really make sense for the additive basis right? Otherwise it doesn't do anything but add the middle dimension? In that case, should it exist for the other bases objects? Or am I missing something?

I prefer that any basis should have the method for consistency. I can see a case in which one create the additive basis in a script, but the number of component is variable, from 1 to N, if you have the method, you can use the same exact code to process a regular basis ( 1 component) and the rest.

Secondly, reshaping correctly by input is handy. Without the method, in my python example above how do you split the feature axis correctly? X.reshape(100, 2, 5) or X.reshape(100, 5, 2)? that depends on the internals of the convolution, but one doesn't need to memorize.

I also don't think the tutorial we have on identifiability constraints right now is sufficient. Might not need to be addressed here, but should be added to the docs project if not. Should explain why it's bad to be rank-deficient, show what you gain from being full rank, etc.

Yes, totally

billbrod

Just the small changes to the docstring we discussed, then this looks good to me!

billbrod · 2024-11-07T15:06:43Z

src/nemos/basis.py

        (..., n_i, b_i, ...)
-        \]
+        $$

        Here:
        - $n_i$ is the number of inputs processed by the **i-th basis component**.


Suggested change

- $n_i$ is the number of inputs processed by the **i-th basis component**.

- $n_i$ is the number of inputs processed by the **i-th basis component**.

Need an empty line here to render the docs correctly

BalzaniEdoardo added 24 commits October 9, 2024 14:36

added labels and tests for conv expected dim

01f9051

added label update for additive and multiplicaive

467ac7e

fix recursion

7e4c3e0

renamed vars

89eee29

renamed vars

28d2079

fixed refactor

36cf6fe

reverted name change

6505eb0

add precomputed num features to avoid nested recursion

48cb87d

cleaned-up function structure and added extra splitting option

6cdaa7a

test refactored and improved

bd11a00

added explicit parameter, and docstrings

b35a272

linted docs

32d91ff

linted code

830d065

linted tests

ba416ae

fixed slkearn cloning error and some docs

0196a38

fixed other docs

a74eb9a

fixed tests

09604c8

added parentheses to composite basis

fc07841

added parentheses to composite basis

5458a3d

refined examples

3b7d471

improved comment

c664d98

removed unused method

b71a36b

improved comment

1047883

changed defaults

f611905

BalzaniEdoardo marked this pull request as ready for review October 10, 2024 16:47

BalzaniEdoardo requested review from billbrod, gviejo and sjvenditto October 10, 2024 16:47

BalzaniEdoardo commented Oct 10, 2024

View reviewed changes

BalzaniEdoardo and others added 10 commits November 5, 2024 09:14

Update src/nemos/basis.py

ea04ecb

Co-authored-by: Sarah Jo Venditto <sjvenditto@gmail.com>

Update src/nemos/basis.py

4d596e5

Co-authored-by: Sarah Jo Venditto <sjvenditto@gmail.com>

Update src/nemos/identifiability_constraints.py

b71af60

Co-authored-by: Sarah Jo Venditto <sjvenditto@gmail.com>

Update src/nemos/identifiability_constraints.py

a244959

Co-authored-by: Sarah Jo Venditto <sjvenditto@gmail.com>

Update docs/tutorials/plot_02_head_direction.py

2f5481d

Co-authored-by: William F. Broderick <billbrod@gmail.com>

Update docs/tutorials/plot_06_calcium_imaging.py

b81ad54

Co-authored-by: William F. Broderick <billbrod@gmail.com>

rank def added text

4315ce3

Merge branch 'coeff_parsing_support' of github.com:flatironinstitute/…

ac341f7

…nemos into coeff_parsing_support

Update src/nemos/identifiability_constraints.py

20bf0e7

Co-authored-by: William F. Broderick <billbrod@gmail.com>

fixed params

6b10021

billbrod mentioned this pull request Nov 5, 2024

Add documentation about model identifiability #259

Open

billbrod and others added 4 commits November 5, 2024 12:14

makes some changes to the rank-deficient part

ef70fcb

Merge branch 'coeff_parsing_support' of github.com:flatironinstitute/…

19ee425

…generalized-linear-models into coeff_parsing_support

removed comments, fixed spacing in docstrings

44fca2e

Merge branch 'coeff_parsing_support' of github.com:flatironinstitute/…

d84e003

…nemos into coeff_parsing_support

BalzaniEdoardo requested review from sjvenditto and billbrod November 5, 2024 19:02

BalzaniEdoardo added 5 commits November 5, 2024 14:03

linted tests

b45f4ad

fixed abs links

c3717a8

Merge pull request #262 from flatironinstitute/fix_pynapple_links

1a4038e

fixed abs links

Merge branch 'main' into development

43a4035

Merge branch 'development' into coeff_parsing_support

55009ff

removed unused func

121cd28

billbrod approved these changes Nov 7, 2024

View reviewed changes

BalzaniEdoardo added 3 commits November 7, 2024 14:12

improved docstrings

32db1a8

docstrings rendering fix

12e4251

linted tests

bf87623

BalzaniEdoardo merged commit 25fa840 into development Nov 7, 2024
13 checks passed

BalzaniEdoardo deleted the coeff_parsing_support branch November 7, 2024 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start facilitating coefficient splitting #247

Start facilitating coefficient splitting #247

BalzaniEdoardo commented Oct 9, 2024 •

edited

Loading

BalzaniEdoardo Oct 10, 2024

BalzaniEdoardo Oct 10, 2024

codecov-commenter commented Oct 10, 2024 •

edited

Loading

BalzaniEdoardo commented Nov 6, 2024 •

edited

Loading

billbrod left a comment

billbrod Nov 7, 2024

	- $n_i$ is the number of inputs processed by the i-th basis component.

	- $n_i$ is the number of inputs processed by the i-th basis component.

Start facilitating coefficient splitting #247

Start facilitating coefficient splitting #247

Conversation

BalzaniEdoardo commented Oct 9, 2024 • edited Loading

Summary

Added To Basis

Example of _get_feature_slicing:

Example of split_by_feature

Moved To nemos.identifiability_constraint

BalzaniEdoardo Oct 10, 2024

Choose a reason for hiding this comment

BalzaniEdoardo Oct 10, 2024

Choose a reason for hiding this comment

codecov-commenter commented Oct 10, 2024 • edited Loading

Codecov Report

BalzaniEdoardo commented Nov 6, 2024 • edited Loading

billbrod left a comment

Choose a reason for hiding this comment

billbrod Nov 7, 2024

Choose a reason for hiding this comment

BalzaniEdoardo commented Oct 9, 2024 •

edited

Loading

Added To `Basis`

Example of `_get_feature_slicing`:

Example of `split_by_feature`

Moved To `nemos.identifiability_constraint`

codecov-commenter commented Oct 10, 2024 •

edited

Loading

BalzaniEdoardo commented Nov 6, 2024 •

edited

Loading