Skip to content

Unable to ensure consistent versions of the SDK in AzureML Pipelines #1115

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
BillmanH opened this issue Aug 25, 2020 · 8 comments
Closed

Unable to ensure consistent versions of the SDK in AzureML Pipelines #1115

BillmanH opened this issue Aug 25, 2020 · 8 comments
Assignees
Labels
Auto ML product-feedback Indicating it's product feedback SDK

Comments

@BillmanH
Copy link

User program failed with ImportError: cannot import name 'RollingOriginValidator'

Using 1.12.0

I learned in this issue that RollingOriginValidator is caused due to discrepancies in different SDK versions in pipeline runs. Like when your local version is different than the version in which an AutoML model is trained.

Now that I've got my full code working, I'm getting an error that a PythonScriptStep has a different sdk version than the AutoMLStep. AutoMLStep doesn't accept a RunConfiguration so I have no way of controling the SDK version that AutoML will use.

My question is: Is there a way to enforce an SDK version in the AutoMLStep, so that I can get consistent outputs? OR is there a way to find out which version of the SDK the AutoMLStep is using so that I can enforce that version in pipeline runs.

Honestly, I don't see why the user is expected to sort this out. In earlier versions the cloud SDK would just sync with the SDK version used locally. Now this no longer works and users have to add this to their tech debt when using AzureML.

Full code here: https://github.com/BillmanH/learn-azureml

You can see RunConfiguration, AutoMLStep, and PythonScriptStep in the main_run.py. All of the steps work fine except for the last step score_step, which crashes due to the library importError: 'RollingOriginValidator'

@BillmanH
Copy link
Author

FWIW, here is the AutoML step:


automl_settings = {
    "iteration_timeout_minutes": 5,
    "iterations": 1,
    "n_cross_validations": 2,
    "primary_metric": 'accuracy',
    "featurization": 'auto',
    "max_concurrent_iterations": 5
}

automl_config = AutoMLConfig(task='classification',
                             debug_log='automl_errors.log',
                             path='iris_gold',
                             training_data=output.read_delimited_files(
                                 'iris_gold.csv'),
                             label_column_name="species",
                             compute_target=f.compute_target,
                             model_explainability=True,
                             ** automl_settings)

train_step = AutoMLStep('automl', automl_config,
                        outputs=[metrics_data, model_data],
                        enable_default_model_output=False,
                        enable_default_metrics_output=False,
                        allow_reuse=True,
                        passthru_automl_config=False)

It's the same as in the link above.

@v-strudm-msft v-strudm-msft added Auto ML product-feedback Indicating it's product feedback labels Aug 25, 2020
@BillmanH
Copy link
Author

Just a heads up,

Here are some things I've tried:

  1. specifying the version of all of the packages individually:
cd = CondaDependencies.create(
    pip_packages=[
        "pandas",
        "numpy",
        "azureml-sdk[automl,interpret]==1.12.0",
        "azureml-defaults==1.12.0",
        "azureml-train-automl-runtime==1.12.0",
    ],
    conda_packages=["xlrd", "scikit-learn", "numpy", "pyyaml", "pip"],
)
  1. specifying no version (hoping that AutoML will pick a version that is consistent):
cd = CondaDependencies.create(
    pip_packages=[
        "pandas",
        "numpy",
        "azureml-sdk[automl,interpret]",
        "azureml-defaults",
        "azureml-train-automl-runtime",
    ],
    conda_packages=["xlrd", "scikit-learn", "numpy", "pyyaml", "pip"],
)
amlcompute_run_config = RunConfiguration(conda_dependencies=cd)
amlcompute_run_config.environment.docker.enabled = True

@BillmanH
Copy link
Author

Just to note:
This is all in the same run, I don't understand how it the final step would have a package inconsistency when the others won't. They are all using the same config:

image

@CESARDELATORRE
Copy link
Collaborator

Adding @anupsms Please, involve folks from the team to advice here about AutoMLStep.

@pbartos
Copy link

pbartos commented Aug 27, 2020

Packages version for the automl step run can be found here:
image

@BillmanH
Copy link
Author

Appreciate the tip, but I feel like the issue is more around enforcing packages that work before running. My biggest concern with AutoML is that I spend all of my time troubleshooting the SDK and never get to do any data science. I could download the definition, then create an env to match, but that will take a lot of extra time and hassle.

@swatig007
Copy link
Collaborator

@CESARDELATORRE can you pls review this issue?

@BillmanH
Copy link
Author

In the end the workaround that made this happen was:

cd = CondaDependencies.create(
    pip_packages=[
        "pandas",
        "numpy",
        "azureml-sdk[automl,interpret]",
        "azureml-defaults",
        "azureml-train-automl-runtime",
    ],
    conda_packages=["xlrd", "scikit-learn", "numpy", "pyyaml", "pip"],
)
amlcompute_run_config = RunConfiguration(conda_dependencies=cd)
amlcompute_run_config.environment.docker.enabled = True
amlcompute_run_config.environment = Environment.get(
    ws, name='AzureML-AutoML').clone("bills-test")

so there is anAzureML-AutoML environment in the system and you get that, clone it and add your packages. You then use that for your environment for all of the other steps to make sure that joblib files will load.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Auto ML product-feedback Indicating it's product feedback SDK
Projects
None yet
Development

No branches or pull requests

6 participants