-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[DOC] User warning over sampling methods #1101
Comments
Basically, we are also working in scikit-learn on this topic. As milestone, we want to have an example that show the effect of sample-weight and class-weight in scikit-learn and then I would like to revamp the documentation of |
Thanks for the answer. Implementation and documentation within sklearn seems to be the way to go in the long run. Maybe in the short term this on-going work should be documented a bit more visibly... a lot of newcomers are still pushing SMOTE and the likes. |
Is there a linked PR or issue in scikit-learn? I am one of the "newcomers" and just found out about this package via stack-exchange |
Describe the issue linked to the documentation
There is some discussion going on about the usefulness of some (if not all) over / under sampling methods implemented in the imbalanced learn package.
Typically there is some doubt about the usefulness of SMOTE:
Basically it seems that:
I think that it is a problem that those discussions are not more visible to the newcomers. (And that more experienced people need to have to deal with that on a weekly basis).
Suggest a potential alternative/fix
It would be nice to have
It shows that it oversampled, but not that it works either in terms or ranking (AUC) / probability calibration (ECE / calibration curve).
Could the doc be upgraded with a better exemple ?
While (one of the) authors have changed its mind about the usefulness of these methods, it seems that a younger crowd is still very eager to jump on these shiny methods. I think it would be helpful for the DS community to make a clearer stance.
I would suggest at least a very visible warning in the doc, like a red banner ('there are some discussion about the usefulness of these methods. See: XXX. Use with caution').
This could be expanded with a UserWarning... may be a bit brutal but it could prevent a lot of trouble.
Edit: not sure why it added the good first issue automatically... but I'll take it.
The text was updated successfully, but these errors were encountered: