-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Unable to create a small sample of 1000 train and 100 using MultilabelStratifiedShuffleSplit #15
Comments
meltedhead, Thank you for catching this bug. I do not think I ever tested with train_size set to a value other than None. As a workaround, you could do the following:
|
Hi there, I don't know if it helps, but I can see the same in that case with only test_size:
The above prints:
but I expected:
|
Ah, just read that in the doc of
Knowing that the above case should be very well distributed, I wonder if an acceptable solution with the given test size is that uncommon |
Hi trent-b:
Thanks for this repository, hope you can help with my issue. I have a large json data set that i want to use MultilabelStratifiedShuffleSplit to create a smaller sample set.
i then call the function as :
train_idx, test_idx = mlb_train_test_split(labels, test_size=1000 train_size=200, random_state=0)
When i look at the numbers I'm seeing way more than 200 rows. Is there a limitation? The labels length is approximately 500,000 in the dataset.
The text was updated successfully, but these errors were encountered: