-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Out of memory on shuffling huge datasets #21
Comments
This might be a bug of Marian. Memory shouldn't grow after |
Related Marian issue: marian-nmt/marian-dev#148 |
|
I suspect the running out of memory, even when --shuffle-in-ram is not used, comes from here: Assuming that's actually the cause, we could replace it with a two-pass shuffle:
Edit: or do it like this Edit: for why |
I didn't see this for some time and I assume it's fixed by using OpusTrainer. |
300M dataset, 128 GB RAM
the workaround is to shuffle dataset after the merge step, disable
--shuffle-in-ram
and use--shuffle batches
The text was updated successfully, but these errors were encountered: