Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Enable fantasy models for multitask GPs Reborn #2317

Merged
merged 6 commits into from
May 11, 2023
Merged

Enable fantasy models for multitask GPs Reborn #2317

merged 6 commits into from
May 11, 2023

Conversation

yyexela
Copy link
Contributor

@yyexela yyexela commented Apr 5, 2023

This pull request (PR) takes this previous pull request and completes it. These code changes allow get_fantasy_model to work with multitask Gaussian Processes (GPs). This in turn, allows derivative enabled GPs to work with the qKG (Knowledge Gradient) acquisition function from the BoTorch package. This PR is a part of my effort to get derivative enabled GPs to work in BoTorch.

I have verified that these changes do not break existing unit tests, and I have added a new unit test which goes through the added code.

Please take a look and let me know what I should change!

@@ -285,7 +302,11 @@ def exact_predictive_mean(self, test_mean: Tensor, test_train_covar: LinearOpera
# NOTE TO FUTURE SELF:
# You **cannot* use addmv here, because test_train_covar may not actually be a non lazy tensor even for an exact
# GP, and using addmv requires you to to_dense test_train_covar, which is obviously a huge no-no!
res = (test_train_covar @ self.mean_cache.unsqueeze(-1)).squeeze(-1)

if len(self.mean_cache.shape) == 4:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue I have with this PR is this line.

While working with a simple BO loop using BoTorch to test my code changes and observe the shapes of everything going through the code, I found that sometimes test_train_covar would be of size [5,1,4,24] and self.mean_cache would be of size [5,1,1,24]. This is why I have an if-statement here. I'm not sure if there's a better way to check if we're working with a derivative enabled GP in this function.

These observed shapes are also why my unit test has new_x and new_y as shapes (1, 1, dim) and (num_fantasies, 1, 1, 1 + dim) respectively, since that's what I observed from my test code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if anyone has any thoughts about this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that sometimes test_train_covar would be of size [5,1,4,24] and self.mean_cache would be of size [5,1,1,24]

Do you have a sense for why this is? Is this some insufficient invalidation of self.mean_cache? Figure I'd ask first before getting into a rabbit hole here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'm not 100% sure, but self.mean_cache appears to be set to size [5,1,1,24] from a call to exact_gp.y:239 when the new prediction_strategy is created

This is called from get_fantasy_model(self, inputs, targets) where

# The below are torch.tensors, I just show their dimensions
inputs =[torch.Size([1,1,3])
targets = torch.Size([5,1,1,4]) 

And old_pred_strat.get_fantasy_strategy(inputs, targets, full_inputs, full_targets, full_output) is called with:

# The below are torch.tensors, I just show their dimensions
inputs = [torch.Size([1,1,3])]
targets = torch.Size([5,1,1,4]) 
full_inputs = [torch.Size([1, 6, 3])] 
full_targets = torch.Size([5, 1, 6, 4])

full_output = MultitaskMultivariateNormal(loc: torch.Size([1, 24]))

The targets come from model.py:335 in BoTorch from a call to sampler. I don't specify a sampler in my code, I just use whatever default sampler is in qKnowledgeGradient:

scal_transf = ScalarizedPosteriorTransform(weights=torch.tensor([1.0] + [0.0]*dim, dtype=torch.double))
        
# Define qKG acquisition function
qKG = qKnowledgeGradient(model,\
            posterior_transform=scal_transf,\
            num_fantasies=5)

Hopefully this helps! I'm not sure what the expected behavior should be, but please let me know how I can help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the extra 1 appearing is just a soft incompatibility we never noticed where BoTorch requires an explicit task dim for the labels, but we don't in gpytorch. Indeed, my default is usually to have a single dim label vector, so when writing the code something like this could have slipped by me.

Copy link
Collaborator

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great for taking on this effort.

@@ -209,6 +223,9 @@ def get_fantasy_strategy(self, inputs, targets, full_inputs, full_targets, full_
new_root = BatchRepeatLinearOperator(DenseLinearOperator(new_root), repeat_shape)
# no need to repeat the covar cache, broadcasting will do the right thing

if isinstance(full_output, MultitaskMultivariateNormal):
full_mean = full_mean.view(*target_batch_shape, -1, num_tasks).contiguous()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of .view().contiguous(), can also just use reshape() here

@@ -285,7 +302,11 @@ def exact_predictive_mean(self, test_mean: Tensor, test_train_covar: LinearOpera
# NOTE TO FUTURE SELF:
# You **cannot* use addmv here, because test_train_covar may not actually be a non lazy tensor even for an exact
# GP, and using addmv requires you to to_dense test_train_covar, which is obviously a huge no-no!
res = (test_train_covar @ self.mean_cache.unsqueeze(-1)).squeeze(-1)

if len(self.mean_cache.shape) == 4:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that sometimes test_train_covar would be of size [5,1,4,24] and self.mean_cache would be of size [5,1,1,24]

Do you have a sense for why this is? Is this some insufficient invalidation of self.mean_cache? Figure I'd ask first before getting into a rabbit hole here...

@yyexela
Copy link
Contributor Author

yyexela commented May 2, 2023

Hey! Wondering if there is anything else I should do to try to get this through? I'm graduating soon and will have limited time to work on this in the fall, with some time over the summer but I'll be interning full time. Thanks!

@jacobrgardner
Copy link
Member

We should definitely try to get this in, thanks for putting this up! Things mostly lgtm, other than Max's comment.

Copy link
Member

@jacobrgardner jacobrgardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through more carefully, lgtm!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants