Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update dask deps and xfail one test #1278

Merged
merged 13 commits into from
Feb 10, 2025
Merged

Update dask deps and xfail one test #1278

merged 13 commits into from
Feb 10, 2025

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Feb 9, 2025

Updates dask dependencies.

Note - see #1279 for details. We'll skip fixing the pandera series schema + dask series object "dask" key issue.

Changes

  • imports -- gets the new dependencies right
  • migrates some code to reference the right modules
  • marks test as xfail since it doesn't seem to be worth fixing for right now

How I tested this

  • locally and via integration tests

Notes

  • squash commit this PR

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz marked this pull request as ready for review February 10, 2025 21:52
@skrawcz skrawcz changed the title WIP fix dask changes Update dask deps and xfail one test Feb 10, 2025
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 1b02b76 in 2 minutes and 12 seconds

More details
  • Looked at 158 lines of code in 7 files
  • Skipped 0 files when reviewing.
  • Skipped posting 15 drafted comments based on config settings.
1. .ci/test.sh:29
  • Draft comment:
    Using -e '.[pandera, test]' installs additional test dependencies. Consider documenting why both pandera and test extras are needed here in the docs/integrations section.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
2. examples/dask/hello_world/data_loaders.py:12
  • Draft comment:
    Removal of the 'name' parameter from dataframe.from_pandas: update the function docstring to indicate that the Series name is now dropped.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
3. hamilton/plugins/h_dask.py:19
  • Draft comment:
    Good use of try/except for dask_scalar import; consider adding a comment referencing which versions of dask are supported.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
4. hamilton/plugins/h_dask.py:265
  • Draft comment:
    Changed from dask.dataframe.multi.concat to dask.dataframe.concat. Ensure this update is reflected in the documentation for dask adapter usage.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
5. pyproject.toml:54
  • Draft comment:
    dask-expr is now conditionally installed for Python 3.9; please ensure that the docs/integrations include guidance on using dask-expr with Hamilton.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
6. tests/integrations/pandera/test_pandera_data_quality.py:201
  • Draft comment:
    Marking the dask series test as xfail is acceptable for now, but consider adding more context in the xfail comment so users know what issue to track.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
7. .ci/test.sh:29
  • Draft comment:
    Using '.[pandera, test]' and installing integration requirements from a file is a good move for consistency. However, note that the Python version check now only matches exactly 3.9 for installing dask-expr—ensure that this restriction is intentional and that behavior on other versions is as expected.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The first part about package installation changes is just informative and not actionable. The second part about Python version check asks for verification, which violates the rule "Do NOT ask the PR author to confirm their intention". The change in version check logic is clear and explicit in the code - if it was wrong, it would be caught during testing.
    The version check change could potentially break functionality for Python versions > 3.9 that previously worked. Maybe this deserves attention?
    Per the rules, we should not make speculative comments or ask for verification. If there was a real issue, it would be caught by the CI tests running on different Python versions.
    The comment should be deleted as it's partly informative and partly asking for verification, both of which violate the review rules.
8. examples/dask/hello_world/data_loaders.py:12
  • Draft comment:
    The explicit 'name' parameter has been removed from the call to dataframe.from_pandas in both spend() and #s() functions. Confirm that this change is intentional and update the function docstrings if the series name was previously relied upon downstream.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The comment asks to "confirm" the change is intentional and update docstrings - this violates our rule about not asking for confirmations or explanations. The functions are clearly marked as dummy examples. The docstrings don't mention the series names at all. This seems like an unnecessary verification request.
    Maybe the series names were important for the example's functionality? Maybe downstream code depends on these specific names?
    These are explicitly marked as dummy example functions. Even if downstream code existed, asking for confirmation violates our rules. If it was critical, it would cause obvious failures.
    Delete this comment as it violates our rule about not asking for confirmations or explanations. It's asking the author to verify their intention and check downstream impacts.
9. hamilton/plugins/h_dask.py:18
  • Draft comment:
    The try/except block for importing dask_scalar nicely provides backward compatibility with older dask versions. This is a clean and maintainable solution.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
10. hamilton/plugins/h_dask.py:227
  • Draft comment:
    Replacing the hardcoded check for dask.dataframe.core.Scalar with the imported dask_scalar enhances compatibility across dask versions.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
11. hamilton/plugins/h_dask.py:257
  • Draft comment:
    Switching from dask.dataframe.multi.concat to dask.dataframe.concat follows the updated dask API. Verify that the subsequent column renaming continues to work as expected.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
12. plugin_tests/h_dask/conftest.py:9
  • Draft comment:
    The conditional configuration (disabling 'dataframe.query-planning' for Python versions below 3.10) is clear. Ensure that this configuration stays valid with future versions of dask.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
13. pyproject.toml:54
  • Draft comment:
    Limiting the 'dask-expr' dependency to Python 3.9 (using the conditional marker) appears intended. Please ensure this restriction aligns with CI and integration test expectations, especially as other parts of the codebase depend on dask-expr behavior.
  • Reason this comment was not posted:
    Marked as duplicate.
14. tests/integrations/pandera/requirements.txt:2
  • Draft comment:
    Including 'pandera[dask]' in the requirements ensures that the correct pandera-related extras are installed. This change appears appropriate.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
15. tests/integrations/pandera/test_pandera_data_quality.py:202
  • Draft comment:
    Marking the test 'test_pandera_decorator_dask_series' as xfail due to a known import issue is pragmatic. Consider adding a reference to a tracking issue so it’s easier to revisit when the underlying problem is fixed.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    While tracking issues can be helpful, this seems more like a documentation suggestion than a required code change. The xfail already has a descriptive reason. Without knowing if there's even a tracking issue to reference, or if this is something that will be investigated further, the comment is speculative. The current xfail message provides sufficient information about the failure reason.
    The comment could be valuable for long-term maintenance by making it easier to track when the issue is resolved. Having a tracking issue would provide more context about the problem.
    However, creating tracking issues is an optional documentation practice, not a required code change. The xfail already communicates the essential information needed for the test.
    Delete the comment as it's suggesting an optional documentation improvement rather than a required code change. The existing xfail message is sufficient.

Workflow ID: wflow_H56N5BdhOGQMx6Cz


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@skrawcz skrawcz merged commit e3573a1 into main Feb 10, 2025
24 checks passed
@skrawcz skrawcz deleted the fix_dask branch February 10, 2025 22:08
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant