Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve performance of RFECV visualizer #1048

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Improve performance of RFECV visualizer #1048

wants to merge 2 commits into from

Conversation

bbengfort
Copy link
Member

This PR fixes #1047 which reported that yellowbrick's internal implementation of RFECV is much slower than scikit-learn's. This PR introduces a new implementation of RFECV that is closer to the latest version of sklearn.RFECV.

I have made the following changes:

  1. Updated RFECV with new parameters to match the sklearn implementation
  2. Implemented an _RFECV that subclasses the sklearn implementation and adds our required functionality

Sample Code and Plot

If you are adding or modifying a visualizer, PLEASE include a sample plot here along with the code you used to generate it.

TODOs and questions

Still to do:

  • Update the docstrings and parameters of all classes, functions, and quick methods
  • Fix the still failing tests
  • Determine if this is the best path forward and reduces the most technical debt
  • See "TODO" and "HACK" comments in the code

Questions for the @DistrictDataLabs/team-oz-maintainers:

  • Does this require a new minimum version of scikit-learn?
  • Does this require us to add joblib as an optional dependency?
  • Is this maintainable in the long run?

@lwgray @rebeccabilbro and @jc639 -- it would be great if we could work on this together

CHECKLIST

  • Is the commit message formatted correctly?
  • Have you noted the new functionality/bugfix in the release notes of the next release?
  • Included a sample plot to visually illustrate your changes?
  • Do all of your functions and methods have docstrings?
  • Have you added/updated unit tests where appropriate?
  • Have you updated the baseline images if necessary?
  • Have you run the unit tests using pytest?
  • Is your code style correct (are you using PEP8, pyflakes)?
  • Have you documented your new feature/functionality in the docs?
  • Have you built the docs using make html?

@rebeccabilbro rebeccabilbro self-requested a review June 11, 2020 16:12
@fab375
Copy link

fab375 commented Dec 23, 2020

Hi, is there are working solution for #1047, that accelerates the RFECV to reach similar performance to the Sklearn implementation?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFECV is much slower than Sklearn's implementation
3 participants