-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
New feature RobustWeightedEstimator : code + examples #42
New feature RobustWeightedEstimator : code + examples #42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @TimotheeMathieu ! A first partial review is below.
In major points,
- it would necessary to add user manual section with a short description of the algorithm (like this one), and maybe mentioning what are known advantages / limitations of it with respect to other outlier robust learners in scikit-learn/
- another major point is that this need tests. In particular,
- for various helper functions that are added.
- for the known edge/special cases of the estimator itself
- finally, ideally it should pass common tests. I.e. it would be necessary to add it here which will run checks from sklearn.utils.estimator_checks. Related to Add instance-level calls to estimator_checks for meta-estimators scikit-learn/scikit-learn#9443. Because it's a meta-estimator common test would need to determine at runtime whether it's a classifier or a regressor (depending on the base estimator), or maybe we would need to expose two meta-estimator one for regressors and one for classifiers.
- ideally it would be good to have a benchmarks script in
benchmarks/
to see how well this works on larger datasets (and have a way to evaluate the performance impact of future changes) - also illustrating that it works on some real world datasets would be useful (including a comparison with other outlier robust learners) even if it's not part of the PR -- or it could be an additional example. There is some discussion about in in the 2018 paper (and in particular in figure 8) but that doesn't say, unless I missed it, how well this approach performs as compared to other classical approaches one might try. If you need more datasets, have a look at https://www.openml.org/ which could be loaded with
sklearn.datasets.fetch_openml
.
The codecov CI failure can be fixed by adding tests to increase the code coverage of added code.. |
I would use snake_case in filenames. |
The problem comes from EigenPro and as it is not my code I could not find quickly where is the bug. |
I saw this that's why I asked. |
The link you gave is of the test 136.1, the current Travis test is 136.4 from what I gathered (I am new to CI tests). |
Basically is one build with different configurations. The one you are referring to is against the nightly version of the @rth whats your opinion here? In the README we say that we support scikit-learn (>=0.21) |
+1 to support just 1 latest version of scikit-learn. @glemaitre was also for it. |
Ok it is fixed, the problem of compatibility was with sklearn0.21, in my version (sklearn 0.22) there was no problem so I didn't see it. Thanks for explaining how CI work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this is taking a long time to merge. I was hoping that scikit-learn-extra would be faster there than scikit-learn, otherwise it defeats the purpose of this repo somewhat.
Generally code and documentation wise LGTM. I have not done a very though review. I think we should merge.
@ngoix What's your opinion about this general approach outlined in the user guide? also cc @agramfort if you have any opinions. |
Sorry this is taking a long time to merge. I was hoping that scikit-learn-extra would be faster there than scikit-learn, otherwise it defeats the purpose of this repo somewhat.
The bottleneck is the same: the reviewer time.
|
@rth No worries, I am already thankful that you all helped me to make it better and that you accepted to teach a rookie how to do a PR and we can't expect you to be a lot faster than scikit-learn when you have in fact less manpower. |
@TimotheeMathieu Could you please merge upstream/master into this branch, that should fix the failing CI. |
I merged but CI still fail, it is only for python3.8 and seems to be because of sklearn-extra/utils/_cyfht.pyx I don't understand the import of pyx files so I can't help. I have python38 and the test also fail for master. |
Fix CI Python 3.8 Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
would this be acceptable ? It is a little hackish. |
Yes, looks like Edit: The problem of a custom regex is that we potentially may have the same issue when comparing againt versions of scipy/numpy and then using standard tools for version comparison is probably safer. |
Merged. Thank you @TimotheeMathieu ! |
Algorithms to do robust classification and regression.
Example of use of this code gives the two attached plots (the code used to generate the plots is in the example folder). The principle is to have algorithms that are robust to outliers both in the feature X and in the labels Y. Most sklearn algorithm allow robust loss functions that exhibit only the robustness in Y (except for Ransac and Theilsen estimators).