-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Enable fantasy models for multitask GPs Reborn #2317
Conversation
@@ -285,7 +302,11 @@ def exact_predictive_mean(self, test_mean: Tensor, test_train_covar: LinearOpera | |||
# NOTE TO FUTURE SELF: | |||
# You **cannot* use addmv here, because test_train_covar may not actually be a non lazy tensor even for an exact | |||
# GP, and using addmv requires you to to_dense test_train_covar, which is obviously a huge no-no! | |||
res = (test_train_covar @ self.mean_cache.unsqueeze(-1)).squeeze(-1) | |||
|
|||
if len(self.mean_cache.shape) == 4: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue I have with this PR is this line.
While working with a simple BO loop using BoTorch to test my code changes and observe the shapes of everything going through the code, I found that sometimes test_train_covar
would be of size [5,1,4,24]
and self.mean_cache
would be of size [5,1,1,24]
. This is why I have an if-statement here. I'm not sure if there's a better way to check if we're working with a derivative enabled GP in this function.
These observed shapes are also why my unit test has new_x
and new_y
as shapes (1, 1, dim)
and (num_fantasies, 1, 1, 1 + dim)
respectively, since that's what I observed from my test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if anyone has any thoughts about this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that sometimes test_train_covar would be of size [5,1,4,24] and self.mean_cache would be of size [5,1,1,24]
Do you have a sense for why this is? Is this some insufficient invalidation of self.mean_cache
? Figure I'd ask first before getting into a rabbit hole here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I'm not 100% sure, but self.mean_cache
appears to be set to size [5,1,1,24]
from a call to exact_gp.y:239 when the new prediction_strategy
is created
This is called from get_fantasy_model(self, inputs, targets)
where
# The below are torch.tensors, I just show their dimensions
inputs =[torch.Size([1,1,3])
targets = torch.Size([5,1,1,4])
And old_pred_strat.get_fantasy_strategy(inputs, targets, full_inputs, full_targets, full_output)
is called with:
# The below are torch.tensors, I just show their dimensions
inputs = [torch.Size([1,1,3])]
targets = torch.Size([5,1,1,4])
full_inputs = [torch.Size([1, 6, 3])]
full_targets = torch.Size([5, 1, 6, 4])
full_output = MultitaskMultivariateNormal(loc: torch.Size([1, 24]))
The targets
come from model.py:335
in BoTorch from a call to sampler
. I don't specify a sampler in my code, I just use whatever default sampler is in qKnowledgeGradient
:
scal_transf = ScalarizedPosteriorTransform(weights=torch.tensor([1.0] + [0.0]*dim, dtype=torch.double))
# Define qKG acquisition function
qKG = qKnowledgeGradient(model,\
posterior_transform=scal_transf,\
num_fantasies=5)
Hopefully this helps! I'm not sure what the expected behavior should be, but please let me know how I can help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the extra 1 appearing is just a soft incompatibility we never noticed where BoTorch requires an explicit task dim for the labels, but we don't in gpytorch. Indeed, my default is usually to have a single dim label vector, so when writing the code something like this could have slipped by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, great for taking on this effort.
@@ -209,6 +223,9 @@ def get_fantasy_strategy(self, inputs, targets, full_inputs, full_targets, full_ | |||
new_root = BatchRepeatLinearOperator(DenseLinearOperator(new_root), repeat_shape) | |||
# no need to repeat the covar cache, broadcasting will do the right thing | |||
|
|||
if isinstance(full_output, MultitaskMultivariateNormal): | |||
full_mean = full_mean.view(*target_batch_shape, -1, num_tasks).contiguous() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of .view().contiguous()
, can also just use reshape()
here
@@ -285,7 +302,11 @@ def exact_predictive_mean(self, test_mean: Tensor, test_train_covar: LinearOpera | |||
# NOTE TO FUTURE SELF: | |||
# You **cannot* use addmv here, because test_train_covar may not actually be a non lazy tensor even for an exact | |||
# GP, and using addmv requires you to to_dense test_train_covar, which is obviously a huge no-no! | |||
res = (test_train_covar @ self.mean_cache.unsqueeze(-1)).squeeze(-1) | |||
|
|||
if len(self.mean_cache.shape) == 4: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that sometimes test_train_covar would be of size [5,1,4,24] and self.mean_cache would be of size [5,1,1,24]
Do you have a sense for why this is? Is this some insufficient invalidation of self.mean_cache
? Figure I'd ask first before getting into a rabbit hole here...
Hey! Wondering if there is anything else I should do to try to get this through? I'm graduating soon and will have limited time to work on this in the fall, with some time over the summer but I'll be interning full time. Thanks! |
We should definitely try to get this in, thanks for putting this up! Things mostly lgtm, other than Max's comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through more carefully, lgtm!
This pull request (PR) takes this previous pull request and completes it. These code changes allow
get_fantasy_model
to work with multitask Gaussian Processes (GPs). This in turn, allows derivative enabled GPs to work with the qKG (Knowledge Gradient) acquisition function from the BoTorch package. This PR is a part of my effort to get derivative enabled GPs to work in BoTorch.I have verified that these changes do not break existing unit tests, and I have added a new unit test which goes through the added code.
Please take a look and let me know what I should change!