Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implement sklearn suggestions for model maintainability #38

Open
kdoroschak opened this issue Apr 27, 2020 · 0 comments
Open

Implement sklearn suggestions for model maintainability #38

kdoroschak opened this issue Apr 27, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@kdoroschak
Copy link
Member

The random forest classification option is being loaded using instructions here: https://scikit-learn.org/stable/modules/model_persistence.html

This will permanently require a specific version (or range of versions) of sklearn, and/or additional checks like they suggest (copied here):

In order to rebuild a similar model with future versions of scikit-learn, additional metadata should be saved along the pickled model:

The training data, e.g. a reference to an immutable snapshot

The python source code used to generate the model

The versions of scikit-learn and its dependencies

The cross validation score obtained on the training data

This should make it possible to check that the cross-validation score is in the same range as before.

Code/data location:

Training data is too large to include directly, but maybe we can create a small dataset for a unittest as a sentinel for "hey, something changed, check the sklearn version".

@kdoroschak kdoroschak added the enhancement New feature or request label Apr 27, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant