Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Handle multivariate responses with HSGP #856

Merged
merged 6 commits into from
Dec 16, 2024

Conversation

tomicapretto
Copy link
Collaborator

This PR does two things.

  • Main: Make it possible to use HSGP with multivariate responses, such as a multinomial likelihood.
  • Adjacent: Make multivariate families use two dims. So far, it created an extra dimension which was not equal to the response dimension.

There's something I would like to clarify about HSGP with multivariate responses. With this PR it's possible to do something like

formula = "c(y1, y2, y3) ~ 0 + hsgp(x, m=30, c=2)"
hsgp_model = bmb.Model(formula, df, family="multinomial")

# Setting aliases for a nicer graph
hsgp_model.set_alias({"c(y1, y2, y3)": "result"})
hsgp_model.set_alias({"hsgp(x, m=30, c=2)": "hsgp"})
hsgp_model.build()
hsgp_model.graph()

and the graph will look like:

image

See all the dimensions of the response share the same priors for hsgp_sigma and hsgp_ell. Theoretically, it's possible to use a different coefficient for each response dimension. However, Bambi does not support that, and after some thought I decided it is fine that way.

The implementation is very complicated already, and it would be much more complicated if we decided to handle this. There's the by (and share_cov) argument in hsgp(), which could make us think we can use it for this purpose. However, that argument expects categories to be values within a given variable. In the multivariate family case, the dimensions are different columns (i.e. y1, y2, and y3 in the example above). So, at least, we would need a special way to tell Bambi to handle things differently in this special case.

On top of that, a multivariate model with an HSGP is already a fairly complex model. To have a more granular control of HSGP, one should use a PPL like PyMC.

I'm open to change my mind in the future, but for now, I think this is good enough.

@codecov-commenter
Copy link

codecov-commenter commented Nov 10, 2024

Codecov Report

Attention: Patch coverage is 69.23077% with 4 lines in your changes missing coverage. Please review.

Project coverage is 89.33%. Comparing base (516d7bd) to head (59f2f7a).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
bambi/backend/terms.py 60.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #856      +/-   ##
==========================================
- Coverage   89.71%   89.33%   -0.39%     
==========================================
  Files          47       47              
  Lines        3997     4030      +33     
==========================================
+ Hits         3586     3600      +14     
- Misses        411      430      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @tomicapretto , thanks a lot for adding this!
I'd vote to allow share_cov=False also for these cases, but I understand it's hard to implement. I'm happy to contribute this though, if you think it's not too big

@tomicapretto tomicapretto merged commit 1559a97 into bambinos:main Dec 16, 2024
4 checks passed
@tomicapretto tomicapretto deleted the hsgp-multivariate-responses branch December 16, 2024 11:41
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants