Add GradCache loss #70

NohTow · 2024-11-06T12:51:38Z

The GradCache method allows to scale the effective batch size of contrastive loss with a constant requirement in memory, overcoming the issue of gradient accumulation not being equivalent to larger bs for contrastive loss.
I think it would be a nice addition for people training ColBERT models with contrastive and not distillation.

As it is already implemented in Sentence Transformer, adding it to PyLate should be straightforward.
I think you already experimented with it @raphaelsty, so maybe you can take this one?

cc @tomaarsen

raphaelsty · 2024-11-06T15:48:26Z

I'd be happy to work on this one 🙌

NohTow added the enhancement New feature or request label Nov 6, 2024

raphaelsty self-assigned this Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GradCache loss #70

Add GradCache loss #70

NohTow commented Nov 6, 2024

raphaelsty commented Nov 6, 2024

Add GradCache loss #70

Add GradCache loss #70

Comments

NohTow commented Nov 6, 2024

raphaelsty commented Nov 6, 2024