[Bug] condition_on_observation with SaasFullyBayesianGP #1680

hvarfner · 2023-02-15T18:44:19Z

🐛 Bug

With the risk of looking a bit silly, I don't think conditioning on additional observations works when using fully Bayesian GPs out of the box.

To reproduce

The SAASBO tutorial works fine to reproduce.
** Code snippet to reproduce **

import os

import matplotlib.pyplot as plt
import numpy as np
import torch
from botorch import fit_fully_bayesian_model_nuts
from botorch.acquisition import qExpectedImprovement
from botorch.models.fully_bayesian import SaasFullyBayesianSingleTaskGP
from botorch.models.transforms import Standardize
from botorch.optim import optimize_acqf
from botorch.test_functions import Branin
from torch.quasirandom import SobolEngine

SMOKE_TEST = os.environ.get("SMOKE_TEST")
tkwargs = {"device": torch.device("cuda:1" if torch.cuda.is_available() else "cpu"), "dtype": torch.double}

WARMUP_STEPS = 512 if not SMOKE_TEST else 32
NUM_SAMPLES = 256 if not SMOKE_TEST else 16
THINNING = 16

train_X = torch.rand(10, 4, **tkwargs)
test_X = torch.rand(5, 4, **tkwargs)
train_Y = torch.sin(train_X[:, :1])
test_Y = torch.sin(test_X[:, :1])

gp = SaasFullyBayesianSingleTaskGP(train_X=train_X, train_Y=train_Y)
fit_fully_bayesian_model_nuts(
    gp, warmup_steps=WARMUP_STEPS, num_samples=NUM_SAMPLES, thinning=THINNING, disable_progbar=True
)
with torch.no_grad():
    posterior = gp.posterior(test_X)

# Shape check
print(gp.covar_module.base_kernel.lengthscale.shape) # 16 x 1 x 4

cond_X = torch.rand(5, 4, **tkwargs) # 5 x 4
cond_Y = torch.sin(cond_X[:, :1]) # 5 x 1

# RuntimeError
gp.condition_on_observations(cond_X, cond_Y)

# and accounting for the batch shape does not work, either
cond_X = torch.rand(5, 4, **tkwargs).repeat(16, 1, 1) # 16 x 5 x 4
cond_Y = torch.sin(cond_X[:, :1]).repeat(16, 1, 1) # 16 x 5 x 1
gp.condition_on_observations(cond_X, cond_Y)
# Please make sure it does not require any external dependencies

** Stack trace/error message **
For both cases, this occurs:

gpytorch/models/exact_prediction_strategies.py:131, in DefaultPredictionStrategy.get_fantasy_strategy(self, inputs, targets, full_inputs, full_targets, full_output, **kwargs)
    127 full_mean, full_covar = full_output.mean, full_output.lazy_covariance_matrix
    129 batch_shape = full_inputs[0].shape[:-2]
--> 131 full_mean = full_mean.view(*batch_shape, -1)
    132 num_train = self.num_train
    134 # Evaluate fant x train and fant x fant covariance matrices, leave train x train unevaluated.

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Expected Behavior

Conditioning on the new observations for every constituent GP in the FBGB.

My two cents on the issue

Notably, creating a BatchedMultiOutputModel with num_mcmc_samples batches of train inputs and targets handles this without issue. The issue seems to appear here: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/models/exact_gp.py#L220, where the output is of shape num_mcmc_samples x num_mcmc_samples x num_inputs (should probably be num_mcmc_samples x num_inputs.

System information

Please complete the following information:

Ubuntu 18.04

The text was updated successfully, but these errors were encountered:

Balandat · 2023-02-15T22:51:01Z

Hmm yeah I'm not surprised, I don't think we've tried / tested this before. We'll take a look

hvarfner · 2023-02-17T14:24:58Z

Thanks! I'm trying to sort something out on my end too, but it may be a bit patchy if one is to avoid GPyTorch.

Balandat · 2023-02-17T14:57:57Z

Yeah we may well have to change something on the gpytorch end, but that's ok.

hvarfner · 2023-03-09T20:34:19Z

So, I have a suggested solution to this problem that seems to be relatively pain-free. However, it doesn't quite fit within the BOTorch ecosystem, as it involves duplicating the training data in the FBGP and moving the unsqueeze(MCMC_DIM) from the forward to the posterior call. All in all, it basically looks like this (inside SaasFullyBayesianSingleTaskGP):
CTRL+F "DIFF" to see the diffs =)


   def __init__(self, ....):
        # Other code...
        # DIFF - NEW ADDITION - duplicating the training data
        train_X = train_X.unsqueeze(0).expand(num_mcmc_samples, train_X.shape[0], -1)
        train_Y = train_Y.unsqueeze(0).expand(num_mcmc_samples, train_Y.shape[0], -1)
        self._num_outputs = train_Y.shape[-1]
        X_tf, Y_tf, _ = self._transform_tensor_args(X=train_X, Y=train_Y)
        super().__init__(
            train_inputs=X_tf, train_targets=Y_tf, likelihood=GaussianLikelihood()
        )

    def forward(self, X: Tensor) -> MultivariateNormal:
        """
        Unlike in other classes' `forward` methods, there is no `if self.training`
        block, because it ought to be unreachable: If `self.train()` has been called,
        then `self.covar_module` will be None, `check_if_fitted()` will fail, and the
        rest of this method will not run.
        """
        #X = X.unsqueeze(MCMC_DIM)      #This is the first DIFF - the unsqueeze is REMOVED here
        self._check_if_fitted()
        mean_x = self.mean_module(X)
        covar_x = self.covar_module(X)
        return MultivariateNormal(mean_x, covar_x)

    # pyre-ignore[14]: Inconsistent override
    def posterior(
        self,
        X: Tensor,
        output_indices: Optional[List[int]] = None,
        observation_noise: bool = False,
        posterior_transform: Optional[PosteriorTransform] = None,
        **kwargs: Any,
    ) -> FullyBayesianPosterior:
        r"""Computes the posterior over model outputs at the provided points.

        Args:
            X: A `(batch_shape) x q x d`-dim Tensor, where `d` is the dimension
                of the feature space and `q` is the number of points considered
                jointly.
            output_indices: A list of indices, corresponding to the outputs over
                which to compute the posterior (if the model is multi-output).
                Can be used to speed up computation if only a subset of the
                model's outputs are required for optimization. If omitted,
                computes the posterior over all model outputs.
            observation_noise: If True, add the observation noise from the
                likelihood to the posterior. If a Tensor, use it directly as the
                observation noise (must be of shape `(batch_shape) x q x m`).
            posterior_transform: An optional PosteriorTransform.

        Returns:
            A `FullyBayesianPosterior` object. Includes observation noise if specified.
        """
        self._check_if_fitted()
        posterior = super().posterior(
            X=X.unsqueeze(MCMC_DIM), # ...and this is the second DIFF
            output_indices=output_indices,
            observation_noise=observation_noise,
            posterior_transform=posterior_transform,
            **kwargs,
        )
        posterior = FullyBayesianPosterior(distribution=posterior.distribution)

        return posterior

I guess this equates to a regular batched model. The only issue is that num_mcmc_samples is not available when the model is initialized (or re-initialize when MCMC samples are obtained). In my own runs, I have simply hacked the num_mcmc_samples in there.

saitcakmak · 2023-03-09T21:25:52Z

Yeah, like you said, this is effectively just using a batched model with a FullyBayesianPosterior, which is similar to what we do in the legacy Ax model as well. You can probably achieve the same effect by just loading the covariance & likelihood modules of a trained SaasFullyBayesianSingleTaskGP onto a subclass of a SingleTaskGP that has the posterior modification from your comment and is initialized with duplicated train_X/train_Y. It is a bit messy in either case since we need to expect the added batch dimension after evaluating the acquisition functions and average over that to make sure it works with the rest of the BoTorch stack.

I think what you have is a decent solution.

Balandat · 2023-03-12T20:48:53Z

Ideally we could do this in a way that doesn't require changing the forward/posterior methods. I haven't dug into this in detail, but is the issue here fundamentally that the underlying gpytorch model currently doesn't actually have training data of the right size so that the conditioning logic there fails? I wonder if it would be possible to expand that functionality on the gpytorch end to support this.

hvarfner · 2023-03-13T09:23:43Z

That is the way that I have understood things. The FBGP doesn't have batch dimension in the inputs, since there's only one set of training data. The length- and outputscales obviously do, however, which gives a batch dimension in the output.

So, batch_shape = torch.Size([]) due to the training_data. in get_fantasy_strategy in gpytorch/models/exact_prediction_strategies.py, the shapes of the fantasized outputs are inferred from the batch_shape of the training data. The forward pass gives an output which has the batch_shape of the hyperparameters (e.g. torch.Size([16])), whereas the training data simply doesn't have one.

As such, inside get_fantasy_strategy:
# full_inputs[0].shape: (num_training_data + num_fantasy_data) x dim <-- Note, no batch dimension
# full_mean.shape: num_models x (num_training_data + num_fantasy_data) <-- But there is one here

batch_shape = full_inputs[0].shape[:-2] # Tries to get the batch dim, but there is none.
full_mean = full_mean.view(*batch_shape, -1) # full_mean has a batch dim but batch_shape is empty, so there's aren't enough args to view(...)

RuntimeError: view size is not compatible with input tensor's size and stride...

So, I tried to address it by adding a batch_shape to the training data, but I definitely don't have the full picture...

Balandat · 2023-03-13T15:08:56Z

The FBGP doesn't have batch dimension in the inputs, since there's only one set of training data. The length- and outputscales obviously do, however, which gives a batch dimension in the output.

So one possibility here would be to add a dedicated batch_shape (or similar) property to the GPyTorch model. Then the default implementation would just grab that from the batch shape of the training data, but more exotic model such as the Fully Bayesian ones could just overwrite that property and ideally things would just work.

@dme65, @saitcakmak does this sound reasonable?

saitcakmak · 2023-03-13T20:36:40Z

Yeah, that sounds good. We already have a batch_shape property on BoTorch models. It is just a matter of upstreaming this to GP / ExactGP and using it in the code.

Balandat · 2023-03-19T16:43:19Z

Started a PR for this in cornellius-gp/gpytorch#2307

@hvarfner can you check if this works for this use case? I haven't looked super closely through the prediction strategy, but as the full_inputs are now using the batch_shape as defined by the model this could work out of the box.

hvarfner · 2023-03-20T12:07:00Z

Not quite - the same issue still persists!

It seems like the forward pass through the GP
https://github.com/cornellius-gp/gpytorch/blob/batch_shape/gpytorch/models/exact_gp.py#L229
outputs a batch_shape x batch_shape x num_training_data which causes the aforementioned view(...) to still fail.

The batch_shape attribute in get_fantasy_strategy is N, (which it should be), but the full_output is N x N (should be N).

Once again, only my understanding of things =)

esantorella · 2024-04-05T20:05:32Z

@hvarfner Is this fixed?

hvarfner · 2024-04-05T20:24:31Z

Yes!

esantorella · 2024-04-05T20:25:51Z

Awesome! Do you know what PR fixed it?

hvarfner · 2024-04-05T21:03:29Z

#2151

hvarfner added the bug Something isn't working label Feb 15, 2023

Balandat assigned dme65 Feb 15, 2023

Balandat mentioned this issue Mar 13, 2023

[Feature Request] Add batch_shape property to GP models cornellius-gp/gpytorch#2301

Open

esantorella unassigned dme65 Oct 23, 2023

hvarfner closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] condition_on_observation with SaasFullyBayesianGP #1680

[Bug] condition_on_observation with SaasFullyBayesianGP #1680

hvarfner commented Feb 15, 2023 •

edited

Loading

Balandat commented Feb 15, 2023

hvarfner commented Feb 17, 2023

Balandat commented Feb 17, 2023

hvarfner commented Mar 9, 2023

saitcakmak commented Mar 9, 2023

Balandat commented Mar 12, 2023

hvarfner commented Mar 13, 2023 •

edited

Loading

Balandat commented Mar 13, 2023

saitcakmak commented Mar 13, 2023 •

edited

Loading

Balandat commented Mar 19, 2023

hvarfner commented Mar 20, 2023 •

edited

Loading

esantorella commented Apr 5, 2024

hvarfner commented Apr 5, 2024

esantorella commented Apr 5, 2024

hvarfner commented Apr 5, 2024

[Bug] condition_on_observation with SaasFullyBayesianGP #1680

[Bug] condition_on_observation with SaasFullyBayesianGP #1680

Comments

hvarfner commented Feb 15, 2023 • edited Loading

🐛 Bug

To reproduce

Expected Behavior

My two cents on the issue

System information

Balandat commented Feb 15, 2023

hvarfner commented Feb 17, 2023

Balandat commented Feb 17, 2023

hvarfner commented Mar 9, 2023

saitcakmak commented Mar 9, 2023

Balandat commented Mar 12, 2023

hvarfner commented Mar 13, 2023 • edited Loading

Balandat commented Mar 13, 2023

saitcakmak commented Mar 13, 2023 • edited Loading

Balandat commented Mar 19, 2023

hvarfner commented Mar 20, 2023 • edited Loading

esantorella commented Apr 5, 2024

hvarfner commented Apr 5, 2024

esantorella commented Apr 5, 2024

hvarfner commented Apr 5, 2024

hvarfner commented Feb 15, 2023 •

edited

Loading

hvarfner commented Mar 13, 2023 •

edited

Loading

saitcakmak commented Mar 13, 2023 •

edited

Loading

hvarfner commented Mar 20, 2023 •

edited

Loading