Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add GradCache loss #70

Open
NohTow opened this issue Nov 6, 2024 · 1 comment
Open

Add GradCache loss #70

NohTow opened this issue Nov 6, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@NohTow
Copy link
Collaborator

NohTow commented Nov 6, 2024

The GradCache method allows to scale the effective batch size of contrastive loss with a constant requirement in memory, overcoming the issue of gradient accumulation not being equivalent to larger bs for contrastive loss.
I think it would be a nice addition for people training ColBERT models with contrastive and not distillation.

As it is already implemented in Sentence Transformer, adding it to PyLate should be straightforward.
I think you already experimented with it @raphaelsty, so maybe you can take this one?

cc @tomaarsen

@NohTow NohTow added the enhancement New feature or request label Nov 6, 2024
@raphaelsty
Copy link
Collaborator

I'd be happy to work on this one 🙌

@raphaelsty raphaelsty self-assigned this Nov 6, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants