-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[dataset] add shuffle at shards tar/raw file level #2424
Conversation
raw 和 shard的source dataset需要加个shuffle的参数,原来是不shuffle的,要不然ut 过不了 |
OK. |
增加了两个参数, |
默认值可以直接给个sys.max |
self.dp = TextLineDataPipe(filenames).repeat(cycle).prefetch( | ||
prefetch).shard(partition) | ||
prefetch) | ||
if shuffle: | ||
self.dp = self.dp.shuffle(buffer_size=shuffle_size) | ||
self.dp = self.dp.shard(partition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个shuffle是不是应该在prefetch之前?@Mddct
No description provided.