-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Update dask deps and xfail one test #1278
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dask expr was rolled into dask, but not for 3.9.
Some weird interplay with dependencies. Punting on this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 1b02b76 in 2 minutes and 12 seconds
More details
- Looked at
158
lines of code in7
files - Skipped
0
files when reviewing. - Skipped posting
15
drafted comments based on config settings.
1. .ci/test.sh:29
- Draft comment:
Using-e '.[pandera, test]'
installs additional test dependencies. Consider documenting why both pandera and test extras are needed here in the docs/integrations section. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
2. examples/dask/hello_world/data_loaders.py:12
- Draft comment:
Removal of the 'name' parameter from dataframe.from_pandas: update the function docstring to indicate that the Series name is now dropped. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
3. hamilton/plugins/h_dask.py:19
- Draft comment:
Good use of try/except for dask_scalar import; consider adding a comment referencing which versions of dask are supported. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
4. hamilton/plugins/h_dask.py:265
- Draft comment:
Changed from dask.dataframe.multi.concat to dask.dataframe.concat. Ensure this update is reflected in the documentation for dask adapter usage. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
5. pyproject.toml:54
- Draft comment:
dask-expr is now conditionally installed for Python 3.9; please ensure that the docs/integrations include guidance on using dask-expr with Hamilton. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
6. tests/integrations/pandera/test_pandera_data_quality.py:201
- Draft comment:
Marking the dask series test as xfail is acceptable for now, but consider adding more context in the xfail comment so users know what issue to track. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
7. .ci/test.sh:29
- Draft comment:
Using '.[pandera, test]' and installing integration requirements from a file is a good move for consistency. However, note that the Python version check now only matches exactly 3.9 for installing dask-expr—ensure that this restriction is intentional and that behavior on other versions is as expected. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The first part about package installation changes is just informative and not actionable. The second part about Python version check asks for verification, which violates the rule "Do NOT ask the PR author to confirm their intention". The change in version check logic is clear and explicit in the code - if it was wrong, it would be caught during testing.
The version check change could potentially break functionality for Python versions > 3.9 that previously worked. Maybe this deserves attention?
Per the rules, we should not make speculative comments or ask for verification. If there was a real issue, it would be caught by the CI tests running on different Python versions.
The comment should be deleted as it's partly informative and partly asking for verification, both of which violate the review rules.
8. examples/dask/hello_world/data_loaders.py:12
- Draft comment:
The explicit 'name' parameter has been removed from the call to dataframe.from_pandas in both spend() and #s() functions. Confirm that this change is intentional and update the function docstrings if the series name was previously relied upon downstream. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment asks to "confirm" the change is intentional and update docstrings - this violates our rule about not asking for confirmations or explanations. The functions are clearly marked as dummy examples. The docstrings don't mention the series names at all. This seems like an unnecessary verification request.
Maybe the series names were important for the example's functionality? Maybe downstream code depends on these specific names?
These are explicitly marked as dummy example functions. Even if downstream code existed, asking for confirmation violates our rules. If it was critical, it would cause obvious failures.
Delete this comment as it violates our rule about not asking for confirmations or explanations. It's asking the author to verify their intention and check downstream impacts.
9. hamilton/plugins/h_dask.py:18
- Draft comment:
The try/except block for importing dask_scalar nicely provides backward compatibility with older dask versions. This is a clean and maintainable solution. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
10. hamilton/plugins/h_dask.py:227
- Draft comment:
Replacing the hardcoded check for dask.dataframe.core.Scalar with the imported dask_scalar enhances compatibility across dask versions. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
11. hamilton/plugins/h_dask.py:257
- Draft comment:
Switching from dask.dataframe.multi.concat to dask.dataframe.concat follows the updated dask API. Verify that the subsequent column renaming continues to work as expected. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
12. plugin_tests/h_dask/conftest.py:9
- Draft comment:
The conditional configuration (disabling 'dataframe.query-planning' for Python versions below 3.10) is clear. Ensure that this configuration stays valid with future versions of dask. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
13. pyproject.toml:54
- Draft comment:
Limiting the 'dask-expr' dependency to Python 3.9 (using the conditional marker) appears intended. Please ensure this restriction aligns with CI and integration test expectations, especially as other parts of the codebase depend on dask-expr behavior. - Reason this comment was not posted:
Marked as duplicate.
14. tests/integrations/pandera/requirements.txt:2
- Draft comment:
Including 'pandera[dask]' in the requirements ensures that the correct pandera-related extras are installed. This change appears appropriate. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
15. tests/integrations/pandera/test_pandera_data_quality.py:202
- Draft comment:
Marking the test 'test_pandera_decorator_dask_series' as xfail due to a known import issue is pragmatic. Consider adding a reference to a tracking issue so it’s easier to revisit when the underlying problem is fixed. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
While tracking issues can be helpful, this seems more like a documentation suggestion than a required code change. The xfail already has a descriptive reason. Without knowing if there's even a tracking issue to reference, or if this is something that will be investigated further, the comment is speculative. The current xfail message provides sufficient information about the failure reason.
The comment could be valuable for long-term maintenance by making it easier to track when the issue is resolved. Having a tracking issue would provide more context about the problem.
However, creating tracking issues is an optional documentation practice, not a required code change. The xfail already communicates the essential information needed for the test.
Delete the comment as it's suggesting an optional documentation improvement rather than a required code change. The existing xfail message is sufficient.
Workflow ID: wflow_H56N5BdhOGQMx6Cz
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates dask dependencies.
Note - see #1279 for details. We'll skip fixing the pandera series schema + dask series object "dask" key issue.
Changes
How I tested this
Notes
Checklist