Skip to content

neural nets optimizer shape mismatch during backward pass #78

@srs3

Description

@srs3

@ddbourgin Have an issue where updates to gradients cannot be performed since shapes conflict during backprop... specifically in the optimizer file.

Error reads:

C[param_name]["mean"] = d1 * mean + (1 - d1) * param_grad
ValueError: operands could not be broadcast together with shapes (100,10) (3072,100) 

Model architecture is as follows:

Input -> n_samples, 3072
FC1 -> 3072, 100
FC2 -> 100, 10

The model code is as follows:

def _build_model(self):
    self.model = OrderedDict()
    self.model['fc1'] = FullyConnected(n_out=self.layers[0],
                                       act_fn=ReLU(),
                                       init=self.initializer,
                                       optimizer=self.optimizer)


    self.model['fc2'] = FullyConnected(n_out=self.layers[1],
                                       act_fn=Affine(slope=1, intercept=0),
                                       init=self.initializer,
                                       optimizer=self.optimizer)


    self.model['out'] = Softmax(dim=-1,
                                optimizer=self.optimizer)

@property
def parameters(self):
    return {k: v.parameters for k, v in self.model.items()}

@property
def hyperparameters(self):
    return {k: v.hyperparameters for k, v in self.model.items()}

@property
def derived_variables(self):
    return {k: v.derived_variables for k, v in self.model.items()}

@property
def gradients(self):
    return {k: v.gradients for k, v in self.model.items()}

def forward(self, x):
    out = x
    for k, v in self.model.items():
        out = v.forward(out)
    return out

def backward(self, y, y_pred):
    """Compute dLdy and then backprop through the layers in self.model"""
    dY_pred = self.loss.grad(y, y_pred)
    for k, v in reversed(list(self.model.items())):
        dY_pred = v.backward(dY_pred)
        self._dv['d' + k] = dY_pred
    return dY_pred

def update(self, cur_loss):
    """Perform gradient updates"""
    for k, v in reversed(list(self.model.items())):
        v.update(cur_loss)
    self.flush_gradients()

Hoping we can fix this and also create an example for people to follow. Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions