Extend Support for Dependency Management #1512

sundarshankar89 · 2025-04-03T08:57:23Z

Extend Support for Dependency Management while executing pipeline with python file.

github-actions · 2025-04-09T04:00:48Z

❌ 13/15 passed, 2 failed, 1 skipped, 51s total

❌ test_run_python_dep_failure_pipeline: assert 'Script execution failed' in "Failed to install dependencies: ERROR: Invalid requirement: 'databricks_labs_ucx=0.1.0': Expected end or semicolon (after name and no valid version specifier)\n databricks_labs_ucx=0.1.0\n ^\nHint: = is not a valid operator. Did you mean == ?\n" (13.035s)

assert 'Script execution failed' in "Failed to install dependencies: ERROR: Invalid requirement: 'databricks_labs_ucx=0.1.0': Expected end or semicolon (after name and no valid version specifier)\n    databricks_labs_ucx=0.1.0\n                       ^\nHint: = is not a valid operator. Did you mean == ?\n"
 +  where "Failed to install dependencies: ERROR: Invalid requirement: 'databricks_labs_ucx=0.1.0': Expected end or semicolon (after name and no valid version specifier)\n    databricks_labs_ucx=0.1.0\n                       ^\nHint: = is not a valid operator. Did you mean == ?\n" = StepExecutionResult(step_name='package_status', status=<StepExecutionStatus.ERROR: 'ERROR'>, error_message="Failed to install dependencies: ERROR: Invalid requirement: 'databricks_labs_ucx=0.1.0': Expected end or semicolon (after name and no valid version specifier)\n    databricks_labs_ucx=0.1.0\n                       ^\nHint: = is not a valid operator. Did you mean == ?\n").error_message
[gw3] linux -- Python 3.10.17 /home/runner/work/remorph/remorph/.venv/bin/python
07:38 INFO [databricks.labs.remorph.assessments.pipeline] Creating a virtual environment for Python script execution: $/tmp/tmphdx_6mer/venv
07:38 ERROR [root] Failed to install dependencies: ERROR: Invalid requirement: 'databricks_labs_ucx=0.1.0': Expected end or semicolon (after name and no valid version specifier)
    databricks_labs_ucx=0.1.0
                       ^
Hint: = is not a valid operator. Did you mean == ?
07:38 INFO [databricks.labs.remorph.assessments.pipeline] Creating a virtual environment for Python script execution: $/tmp/tmphdx_6mer/venv
07:38 ERROR [root] Failed to install dependencies: ERROR: Invalid requirement: 'databricks_labs_ucx=0.1.0': Expected end or semicolon (after name and no valid version specifier)
    databricks_labs_ucx=0.1.0
                       ^
Hint: = is not a valid operator. Did you mean == ?
[gw3] linux -- Python 3.10.17 /home/runner/work/remorph/remorph/.venv/bin/python

❌ test_run_pipeline: AssertionError: Step usage_2 failed with status SKIPPED (22.897s)

AssertionError: Step usage_2 failed with status SKIPPED
assert <StepExecutionStatus.SKIPPED: 'SKIPPED'> == <StepExecutionStatus.COMPLETE: 'COMPLETE'>
  
  - COMPLETE
  + SKIPPED
[gw0] linux -- Python 3.10.17 /home/runner/work/remorph/remorph/.venv/bin/python
07:38 INFO [databricks.labs.remorph.assessments.pipeline] Creating a virtual environment for Python script execution: $/tmp/tmp8t8998gx/venv
07:38 INFO [databricks.labs.remorph.assessments.pipeline] Creating a virtual environment for Python script execution: $/tmp/tmp8t8998gx/venv
[gw0] linux -- Python 3.10.17 /home/runner/work/remorph/remorph/.venv/bin/python

_{Running from acceptance #570}

src/databricks/labs/remorph/assessments/profiler_config.py

src/databricks/labs/remorph/assessments/pipeline.py

goodwillpunning · 2025-04-16T16:27:33Z

tests/integration/assessments/test_pipeline.py

@@ -60,6 +71,12 @@ def test_run_python_failure_pipeline(extractor, python_failure_config, get_logge
        pipeline.execute()


+def test_run_python_dep_failure_pipeline(extractor, pipeline_dep_failure_config, get_logger):


I think it makes sense to fail the entire Step if one of the dependencies cannot be installed. Out of curiosity, do you think that this should fail the entire Pipeline execution run too?

goodwillpunning · 2025-04-16T16:28:41Z

tests/resources/assessments/pipeline_config.yml

@@ -26,3 +26,6 @@ steps:
    mode: overwrite
    frequency: daily
    flag: active
+    dependencies:
+      - pandas
+      - duckdb


Maybe we can add a test for a dependency with a version specified as well?

Let me check.

goodwillpunning

PR looks great! I really like this design a lot. Added a few comments around runtime exceptions and also I think you may have left a debugging statement in. Other than that, I think this PR is ready to ship.

asnare

Looking good, I've left a few comments for consideration.

One thing I considered was whether we should skip the virtual environment if there aren't any dependencies, but it turns out that this also avoids a quirk prior to this PR where you don't know exactly what python refers to when executing a step. (And now we do.)

src/databricks/labs/remorph/assessments/pipeline.py

asnare · 2025-04-17T10:05:48Z

src/databricks/labs/remorph/assessments/pipeline.py

+                except json.JSONDecodeError:
+                    logging.info(f"Python script output: {result.stdout}")
+
+            except CalledProcessError as e:


It looks like upon failure we drop anything that was written to stdout. Do you think it's useful to log that?

asnare · 2025-04-17T10:07:55Z

tests/integration/assessments/test_pipeline.py

@@ -22,6 +22,17 @@ def pipeline_config():
    return config


+@pytest.fixture(scope="module")


I'm curious about why this is needed?

you mean the scope variable?

tests/resources/assessments/db_extract_dep.py

…ne_extensions

…o feature/pipeline_extensions

sundarshankar89 and others added 2 commits March 21, 2025 19:39

pipeline extension

e61dd3e

Merge branch 'main' into feature/pipeline_extensions

73b36c2

sundarshankar89 marked this pull request as ready for review April 9, 2025 03:57

sundarshankar89 requested a review from a team as a code owner April 9, 2025 03:57

Merge branch 'main' into feature/pipeline_extensions

440cc70

sundarshankar89 had a problem deploying to tool April 9, 2025 03:57 — with GitHub Actions Failure

sundarshankar89 temporarily deployed to tool April 9, 2025 04:19 — with GitHub Actions Inactive

fmt fixes

f0c87d6

sundarshankar89 had a problem deploying to tool April 10, 2025 10:36 — with GitHub Actions Failure

sundarshankar89 temporarily deployed to tool April 10, 2025 10:51 — with GitHub Actions Inactive

sundarshankar89 added 3 commits April 10, 2025 19:21

refactor

0764e01

Added tests for dependency

d188705

Added tests for dependency

68b0f83

sundarshankar89 temporarily deployed to tool April 14, 2025 16:01 — with GitHub Actions Inactive

Added Failure tests

5bd9408

sundarshankar89 had a problem deploying to tool April 14, 2025 16:09 — with GitHub Actions Failure

installer as venv

5899a3c

sundarshankar89 had a problem deploying to tool April 15, 2025 09:55 — with GitHub Actions Error

fmt fixes

3cdbf7e

sundarshankar89 temporarily deployed to tool April 15, 2025 09:56 — with GitHub Actions Inactive

sundarshankar89 requested a review from asnare April 15, 2025 10:02

gueniai added the feature/profiler label Apr 15, 2025

gueniai requested a review from goodwillpunning April 15, 2025 16:11