Skip to content

Make IterableDataset (optionally) resumable #7385

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yzhangcs
Copy link
Contributor

@yzhangcs yzhangcs commented Feb 4, 2025

What does this PR do?

This PR introduces a new stateful option to the dataset.shuffle method, which defaults to False.
When enabled, this option allows for resumable shuffling of IterableDataset instances, albeit with some additional memory overhead.

Key points:

  • All tests have passed
  • Docstrings have been updated to reflect the new functionality

I'm very looking forward to receiving feedback on this implementation! @lhoestq

@yzhangcs
Copy link
Contributor Author

yzhangcs commented Feb 6, 2025

@lhoestq Hi again~ Just circling back on this
Wondering if there’s anything I can do to help move this forward. 🤗
Thanks!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants