Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Loss calculation should not permanently change shapes of logits and targets #10

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Andrei-Aksionov
Copy link
Contributor

I want to talk about loss calculation in the forward method:

else:
    B, T, C = logits.shape
    logits = logits.view(B*T, C)
    targets = targets.view(B*T)
    loss = F.cross_entropy(logits, targets)

return logits, loss

the logic behind the change of shapes was explained in the video, but I don't understand why we change shapes permanently since it's only needed for the loss calculation only.

In the training loop:

    # sample a batch of data
    xb, yb = get_batch('train')

    # evaluate the loss
    logits, loss = model(xb, yb)

xb has shape of (B, T), yb has shape (B, T) and one might expect that logits will have (B, T, ...) where in fact it's (B*T, ...).

The code works flawlessly simply because we don't do anything with logits, yet I think it's not the most desired behavior, especially considering that this is basically an educational code (or to be precise: an accompanying code for an educational video).

Additional notes

The same could be done with transposing, but since we have batches we need to use Tensor.mT:

else:
    loss = F.cross_entropy(logits.mT, targets)

Based on my benchmarks (on intel cpu) .mT is faster than .view, but F.cross_entropy works significantly faster with 2-D tensor rather than with 3-D, so in this case combo of 'view+F.cross_entropy' is preferable.

And this PR includes also one small renaming:
FeedFoward -> FeedForward

I guess Foward is like Forward, but only if you have a thick British accent 😃

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant