Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug] condition_on_observation with SaasFullyBayesianGP #1680

Closed
hvarfner opened this issue Feb 15, 2023 · 15 comments
Closed

[Bug] condition_on_observation with SaasFullyBayesianGP #1680

hvarfner opened this issue Feb 15, 2023 · 15 comments
Labels
bug Something isn't working

Comments

@hvarfner
Copy link
Contributor

hvarfner commented Feb 15, 2023

🐛 Bug

With the risk of looking a bit silly, I don't think conditioning on additional observations works when using fully Bayesian GPs out of the box.

To reproduce

The SAASBO tutorial works fine to reproduce.
** Code snippet to reproduce **

import os

import matplotlib.pyplot as plt
import numpy as np
import torch
from botorch import fit_fully_bayesian_model_nuts
from botorch.acquisition import qExpectedImprovement
from botorch.models.fully_bayesian import SaasFullyBayesianSingleTaskGP
from botorch.models.transforms import Standardize
from botorch.optim import optimize_acqf
from botorch.test_functions import Branin
from torch.quasirandom import SobolEngine

SMOKE_TEST = os.environ.get("SMOKE_TEST")
tkwargs = {"device": torch.device("cuda:1" if torch.cuda.is_available() else "cpu"), "dtype": torch.double}

WARMUP_STEPS = 512 if not SMOKE_TEST else 32
NUM_SAMPLES = 256 if not SMOKE_TEST else 16
THINNING = 16

train_X = torch.rand(10, 4, **tkwargs)
test_X = torch.rand(5, 4, **tkwargs)
train_Y = torch.sin(train_X[:, :1])
test_Y = torch.sin(test_X[:, :1])

gp = SaasFullyBayesianSingleTaskGP(train_X=train_X, train_Y=train_Y)
fit_fully_bayesian_model_nuts(
    gp, warmup_steps=WARMUP_STEPS, num_samples=NUM_SAMPLES, thinning=THINNING, disable_progbar=True
)
with torch.no_grad():
    posterior = gp.posterior(test_X)

# Shape check
print(gp.covar_module.base_kernel.lengthscale.shape) # 16 x 1 x 4

cond_X = torch.rand(5, 4, **tkwargs) # 5 x 4
cond_Y = torch.sin(cond_X[:, :1]) # 5 x 1

# RuntimeError
gp.condition_on_observations(cond_X, cond_Y)

# and accounting for the batch shape does not work, either
cond_X = torch.rand(5, 4, **tkwargs).repeat(16, 1, 1) # 16 x 5 x 4
cond_Y = torch.sin(cond_X[:, :1]).repeat(16, 1, 1) # 16 x 5 x 1
gp.condition_on_observations(cond_X, cond_Y)
# Please make sure it does not require any external dependencies

** Stack trace/error message **
For both cases, this occurs:

gpytorch/models/exact_prediction_strategies.py:131, in DefaultPredictionStrategy.get_fantasy_strategy(self, inputs, targets, full_inputs, full_targets, full_output, **kwargs)
    127 full_mean, full_covar = full_output.mean, full_output.lazy_covariance_matrix
    129 batch_shape = full_inputs[0].shape[:-2]
--> 131 full_mean = full_mean.view(*batch_shape, -1)
    132 num_train = self.num_train
    134 # Evaluate fant x train and fant x fant covariance matrices, leave train x train unevaluated.

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Expected Behavior

Conditioning on the new observations for every constituent GP in the FBGB.

My two cents on the issue

Notably, creating a BatchedMultiOutputModel with num_mcmc_samples batches of train inputs and targets handles this without issue. The issue seems to appear here: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/models/exact_gp.py#L220, where the output is of shape num_mcmc_samples x num_mcmc_samples x num_inputs (should probably be num_mcmc_samples x num_inputs.

System information

Please complete the following information:

  • Ubuntu 18.04
@hvarfner hvarfner added the bug Something isn't working label Feb 15, 2023
@Balandat
Copy link
Contributor

Hmm yeah I'm not surprised, I don't think we've tried / tested this before. We'll take a look

@hvarfner
Copy link
Contributor Author

Thanks! I'm trying to sort something out on my end too, but it may be a bit patchy if one is to avoid GPyTorch.

@Balandat
Copy link
Contributor

Yeah we may well have to change something on the gpytorch end, but that's ok.

@hvarfner
Copy link
Contributor Author

hvarfner commented Mar 9, 2023

So, I have a suggested solution to this problem that seems to be relatively pain-free. However, it doesn't quite fit within the BOTorch ecosystem, as it involves duplicating the training data in the FBGP and moving the unsqueeze(MCMC_DIM) from the forward to the posterior call. All in all, it basically looks like this (inside SaasFullyBayesianSingleTaskGP):
CTRL+F "DIFF" to see the diffs =)


   def __init__(self, ....):
        # Other code...
        # DIFF - NEW ADDITION - duplicating the training data
        train_X = train_X.unsqueeze(0).expand(num_mcmc_samples, train_X.shape[0], -1)
        train_Y = train_Y.unsqueeze(0).expand(num_mcmc_samples, train_Y.shape[0], -1)
        self._num_outputs = train_Y.shape[-1]
        X_tf, Y_tf, _ = self._transform_tensor_args(X=train_X, Y=train_Y)
        super().__init__(
            train_inputs=X_tf, train_targets=Y_tf, likelihood=GaussianLikelihood()
        )

    def forward(self, X: Tensor) -> MultivariateNormal:
        """
        Unlike in other classes' `forward` methods, there is no `if self.training`
        block, because it ought to be unreachable: If `self.train()` has been called,
        then `self.covar_module` will be None, `check_if_fitted()` will fail, and the
        rest of this method will not run.
        """
        #X = X.unsqueeze(MCMC_DIM)      #This is the first DIFF - the unsqueeze is REMOVED here
        self._check_if_fitted()
        mean_x = self.mean_module(X)
        covar_x = self.covar_module(X)
        return MultivariateNormal(mean_x, covar_x)

    # pyre-ignore[14]: Inconsistent override
    def posterior(
        self,
        X: Tensor,
        output_indices: Optional[List[int]] = None,
        observation_noise: bool = False,
        posterior_transform: Optional[PosteriorTransform] = None,
        **kwargs: Any,
    ) -> FullyBayesianPosterior:
        r"""Computes the posterior over model outputs at the provided points.

        Args:
            X: A `(batch_shape) x q x d`-dim Tensor, where `d` is the dimension
                of the feature space and `q` is the number of points considered
                jointly.
            output_indices: A list of indices, corresponding to the outputs over
                which to compute the posterior (if the model is multi-output).
                Can be used to speed up computation if only a subset of the
                model's outputs are required for optimization. If omitted,
                computes the posterior over all model outputs.
            observation_noise: If True, add the observation noise from the
                likelihood to the posterior. If a Tensor, use it directly as the
                observation noise (must be of shape `(batch_shape) x q x m`).
            posterior_transform: An optional PosteriorTransform.

        Returns:
            A `FullyBayesianPosterior` object. Includes observation noise if specified.
        """
        self._check_if_fitted()
        posterior = super().posterior(
            X=X.unsqueeze(MCMC_DIM), # ...and this is the second DIFF
            output_indices=output_indices,
            observation_noise=observation_noise,
            posterior_transform=posterior_transform,
            **kwargs,
        )
        posterior = FullyBayesianPosterior(distribution=posterior.distribution)

        return posterior

I guess this equates to a regular batched model. The only issue is that num_mcmc_samples is not available when the model is initialized (or re-initialize when MCMC samples are obtained). In my own runs, I have simply hacked the num_mcmc_samples in there.

@saitcakmak
Copy link
Contributor

Yeah, like you said, this is effectively just using a batched model with a FullyBayesianPosterior, which is similar to what we do in the legacy Ax model as well. You can probably achieve the same effect by just loading the covariance & likelihood modules of a trained SaasFullyBayesianSingleTaskGP onto a subclass of a SingleTaskGP that has the posterior modification from your comment and is initialized with duplicated train_X/train_Y. It is a bit messy in either case since we need to expect the added batch dimension after evaluating the acquisition functions and average over that to make sure it works with the rest of the BoTorch stack.

I think what you have is a decent solution.

@Balandat
Copy link
Contributor

Ideally we could do this in a way that doesn't require changing the forward/posterior methods. I haven't dug into this in detail, but is the issue here fundamentally that the underlying gpytorch model currently doesn't actually have training data of the right size so that the conditioning logic there fails? I wonder if it would be possible to expand that functionality on the gpytorch end to support this.

@hvarfner
Copy link
Contributor Author

hvarfner commented Mar 13, 2023

That is the way that I have understood things. The FBGP doesn't have batch dimension in the inputs, since there's only one set of training data. The length- and outputscales obviously do, however, which gives a batch dimension in the output.

So, batch_shape = torch.Size([]) due to the training_data. in get_fantasy_strategy in gpytorch/models/exact_prediction_strategies.py, the shapes of the fantasized outputs are inferred from the batch_shape of the training data. The forward pass gives an output which has the batch_shape of the hyperparameters (e.g. torch.Size([16])), whereas the training data simply doesn't have one.

As such, inside get_fantasy_strategy:
# full_inputs[0].shape: (num_training_data + num_fantasy_data) x dim <-- Note, no batch dimension
# full_mean.shape: num_models x (num_training_data + num_fantasy_data) <-- But there is one here

batch_shape = full_inputs[0].shape[:-2] # Tries to get the batch dim, but there is none.
full_mean = full_mean.view(*batch_shape, -1) # full_mean has a batch dim but batch_shape is empty, so there's aren't enough args to view(...)

RuntimeError: view size is not compatible with input tensor's size and stride...

So, I tried to address it by adding a batch_shape to the training data, but I definitely don't have the full picture...

@Balandat
Copy link
Contributor

The FBGP doesn't have batch dimension in the inputs, since there's only one set of training data. The length- and outputscales obviously do, however, which gives a batch dimension in the output.

So one possibility here would be to add a dedicated batch_shape (or similar) property to the GPyTorch model. Then the default implementation would just grab that from the batch shape of the training data, but more exotic model such as the Fully Bayesian ones could just overwrite that property and ideally things would just work.

@dme65, @saitcakmak does this sound reasonable?

@saitcakmak
Copy link
Contributor

saitcakmak commented Mar 13, 2023

Yeah, that sounds good. We already have a batch_shape property on BoTorch models. It is just a matter of upstreaming this to GP / ExactGP and using it in the code.

@Balandat
Copy link
Contributor

Started a PR for this in cornellius-gp/gpytorch#2307

@hvarfner can you check if this works for this use case? I haven't looked super closely through the prediction strategy, but as the full_inputs are now using the batch_shape as defined by the model this could work out of the box.

@hvarfner
Copy link
Contributor Author

hvarfner commented Mar 20, 2023

Not quite - the same issue still persists!

It seems like the forward pass through the GP
https://github.com/cornellius-gp/gpytorch/blob/batch_shape/gpytorch/models/exact_gp.py#L229
outputs a batch_shape x batch_shape x num_training_data which causes the aforementioned view(...) to still fail.

The batch_shape attribute in get_fantasy_strategy is N, (which it should be), but the full_output is N x N (should be N).

Once again, only my understanding of things =)

@esantorella
Copy link
Member

@hvarfner Is this fixed?

@hvarfner
Copy link
Contributor Author

hvarfner commented Apr 5, 2024

Yes!

@esantorella
Copy link
Member

Awesome! Do you know what PR fixed it?

@hvarfner
Copy link
Contributor Author

hvarfner commented Apr 5, 2024

#2151

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants