Update dask deps and xfail one test #1278

skrawcz · 2025-02-09T07:53:20Z

Updates dask dependencies.

Note - see #1279 for details. We'll skip fixing the pandera series schema + dask series object "dask" key issue.

Changes

imports -- gets the new dependencies right
migrates some code to reference the right modules
marks test as xfail since it doesn't seem to be worth fixing for right now

How I tested this

locally and via integration tests

Notes

squash commit this PR

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

dask expr was rolled into dask, but not for 3.9.

Some weird interplay with dependencies. Punting on this for now.

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 1b02b76 in 2 minutes and 12 seconds

More details

Looked at 158 lines of code in 7 files
Skipped 0 files when reviewing.
Skipped posting 15 drafted comments based on config settings.

1. .ci/test.sh:29

Draft comment:
Using -e '.[pandera, test]' installs additional test dependencies. Consider documenting why both pandera and test extras are needed here in the docs/integrations section.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50%
None

2. examples/dask/hello_world/data_loaders.py:12

Draft comment:
Removal of the 'name' parameter from dataframe.from_pandas: update the function docstring to indicate that the Series name is now dropped.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50%
None

3. hamilton/plugins/h_dask.py:19

Draft comment:
Good use of try/except for dask_scalar import; consider adding a comment referencing which versions of dask are supported.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

4. hamilton/plugins/h_dask.py:265

Draft comment:
Changed from dask.dataframe.multi.concat to dask.dataframe.concat. Ensure this update is reflected in the documentation for dask adapter usage.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50%
None

5. pyproject.toml:54

Draft comment:
dask-expr is now conditionally installed for Python 3.9; please ensure that the docs/integrations include guidance on using dask-expr with Hamilton.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50%
None

6. tests/integrations/pandera/test_pandera_data_quality.py:201

Draft comment:
Marking the dask series test as xfail is acceptable for now, but consider adding more context in the xfail comment so users know what issue to track.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50%
None

7. .ci/test.sh:29

Draft comment:
Using '.[pandera, test]' and installing integration requirements from a file is a good move for consistency. However, note that the Python version check now only matches exactly 3.9 for installing dask-expr—ensure that this restriction is intentional and that behavior on other versions is as expected.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The first part about package installation changes is just informative and not actionable. The second part about Python version check asks for verification, which violates the rule "Do NOT ask the PR author to confirm their intention". The change in version check logic is clear and explicit in the code - if it was wrong, it would be caught during testing.
The version check change could potentially break functionality for Python versions > 3.9 that previously worked. Maybe this deserves attention?
Per the rules, we should not make speculative comments or ask for verification. If there was a real issue, it would be caught by the CI tests running on different Python versions.
The comment should be deleted as it's partly informative and partly asking for verification, both of which violate the review rules.

8. examples/dask/hello_world/data_loaders.py:12

Draft comment:
The explicit 'name' parameter has been removed from the call to dataframe.from_pandas in both spend() and #s() functions. Confirm that this change is intentional and update the function docstrings if the series name was previously relied upon downstream.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment asks to "confirm" the change is intentional and update docstrings - this violates our rule about not asking for confirmations or explanations. The functions are clearly marked as dummy examples. The docstrings don't mention the series names at all. This seems like an unnecessary verification request.
Maybe the series names were important for the example's functionality? Maybe downstream code depends on these specific names?
These are explicitly marked as dummy example functions. Even if downstream code existed, asking for confirmation violates our rules. If it was critical, it would cause obvious failures.
Delete this comment as it violates our rule about not asking for confirmations or explanations. It's asking the author to verify their intention and check downstream impacts.

9. hamilton/plugins/h_dask.py:18

Draft comment:
The try/except block for importing dask_scalar nicely provides backward compatibility with older dask versions. This is a clean and maintainable solution.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

10. hamilton/plugins/h_dask.py:227

Draft comment:
Replacing the hardcoded check for dask.dataframe.core.Scalar with the imported dask_scalar enhances compatibility across dask versions.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

11. hamilton/plugins/h_dask.py:257

Draft comment:
Switching from dask.dataframe.multi.concat to dask.dataframe.concat follows the updated dask API. Verify that the subsequent column renaming continues to work as expected.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

12. plugin_tests/h_dask/conftest.py:9

Draft comment:
The conditional configuration (disabling 'dataframe.query-planning' for Python versions below 3.10) is clear. Ensure that this configuration stays valid with future versions of dask.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

13. pyproject.toml:54

Draft comment:
Limiting the 'dask-expr' dependency to Python 3.9 (using the conditional marker) appears intended. Please ensure this restriction aligns with CI and integration test expectations, especially as other parts of the codebase depend on dask-expr behavior.
Reason this comment was not posted:
Marked as duplicate.

14. tests/integrations/pandera/requirements.txt:2

Draft comment:
Including 'pandera[dask]' in the requirements ensures that the correct pandera-related extras are installed. This change appears appropriate.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

15. tests/integrations/pandera/test_pandera_data_quality.py:202

Draft comment:
Marking the test 'test_pandera_decorator_dask_series' as xfail due to a known import issue is pragmatic. Consider adding a reference to a tracking issue so it’s easier to revisit when the underlying problem is fixed.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
While tracking issues can be helpful, this seems more like a documentation suggestion than a required code change. The xfail already has a descriptive reason. Without knowing if there's even a tracking issue to reference, or if this is something that will be investigated further, the comment is speculative. The current xfail message provides sufficient information about the failure reason.
The comment could be valuable for long-term maintenance by making it easier to track when the issue is resolved. Having a tracking issue would provide more context about the problem.
However, creating tracking issues is an optional documentation practice, not a required code change. The xfail already communicates the essential information needed for the test.
Delete the comment as it's suggesting an optional documentation improvement rather than a required code change. The existing xfail message is sufficient.

Workflow ID: wflow_H56N5BdhOGQMx6Cz

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

skrawcz added 13 commits February 8, 2025 23:52

WIP fix dask changes

d50a5e2

Remove now incorrect config from dask test

4b9e2cd

WIP

50dab62

Making sure installed deps are updated for dask

a3a787a

WIP

32bd2a5

WIP

5bdd255

Add case for dask + py3.9 test

970f362

dask expr was rolled into dask, but not for 3.9.

Fix deps for pandera dask

b4487c3

WIP

455914c

WIP

cf552c9

Pandera dask series mark as xfail

3edb1b0

Some weird interplay with dependencies. Punting on this for now.

WIP

27d0aa5

WIP

1b02b76

skrawcz marked this pull request as ready for review February 10, 2025 21:52

skrawcz changed the title ~~WIP fix dask changes~~ Update dask deps and xfail one test Feb 10, 2025

ellipsis-dev bot reviewed Feb 10, 2025

View reviewed changes

skrawcz merged commit e3573a1 into main Feb 10, 2025
24 checks passed

skrawcz deleted the fix_dask branch February 10, 2025 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dask deps and xfail one test #1278

Update dask deps and xfail one test #1278

skrawcz commented Feb 9, 2025 •

edited

Loading

ellipsis-dev bot left a comment

Update dask deps and xfail one test #1278

Update dask deps and xfail one test #1278

Conversation

skrawcz commented Feb 9, 2025 • edited Loading

Changes

How I tested this

Notes

Checklist

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

skrawcz commented Feb 9, 2025 •

edited

Loading