Loss calculation should not permanently change shapes of logits and targets #10

Andrei-Aksionov · 2023-02-09T14:30:59Z

I want to talk about loss calculation in the forward method:

else:
    B, T, C = logits.shape
    logits = logits.view(B*T, C)
    targets = targets.view(B*T)
    loss = F.cross_entropy(logits, targets)

return logits, loss

the logic behind the change of shapes was explained in the video, but I don't understand why we change shapes permanently since it's only needed for the loss calculation only.

In the training loop:

    # sample a batch of data
    xb, yb = get_batch('train')

    # evaluate the loss
    logits, loss = model(xb, yb)

xb has shape of (B, T), yb has shape (B, T) and one might expect that logits will have (B, T, ...) where in fact it's (B*T, ...).

The code works flawlessly simply because we don't do anything with logits, yet I think it's not the most desired behavior, especially considering that this is basically an educational code (or to be precise: an accompanying code for an educational video).

Additional notes

The same could be done with transposing, but since we have batches we need to use Tensor.mT:

else:
    loss = F.cross_entropy(logits.mT, targets)

Based on my benchmarks (on intel cpu) .mT is faster than .view, but F.cross_entropy works significantly faster with 2-D tensor rather than with 3-D, so in this case combo of 'view+F.cross_entropy' is preferable.

And this PR includes also one small renaming:
FeedFoward -> FeedForward

I guess Foward is like Forward, but only if you have a thick British accent 😃

targets

Loss calculation should not permanently change shapes of logits and

282ee9e

targets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss calculation should not permanently change shapes of logits and targets #10

Loss calculation should not permanently change shapes of logits and targets #10

Andrei-Aksionov commented Feb 9, 2023

Loss calculation should not permanently change shapes of logits and targets #10

Are you sure you want to change the base?

Loss calculation should not permanently change shapes of logits and targets #10

Conversation

Andrei-Aksionov commented Feb 9, 2023

Additional notes