-
Notifications
You must be signed in to change notification settings - Fork 143
Add natural language filtering and sorting #85
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Arnav Agrawal <aa779@cornell.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 045cb8e in 2 minutes and 17 seconds
More details
- Looked at
1769
lines of code in11
files - Skipped
0
files when reviewing. - Skipped posting
13
drafted comments based on config settings.
1. sdks/python/morphik/sync.py:1460
- Draft comment:
Ensure proper exception handling when parsing responses from _request. Currently response.json() is used without catching JSONDecodeError, which could lead to uninformative exceptions if the server returns unexpected content. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. sdks/python/morphik/sync.py:1510
- Draft comment:
Similar file handling code appears repeatedly (e.g. ingest_file, ingest_files). Consider refactoring file opening/closing into a shared utility function or using a context manager (e.g. contextlib.closing) to ensure files are always properly closed. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. sdks/python/morphik/sync.py:2190
- Draft comment:
When updating document metadata via update_document_by_filename_metadata, the code makes a subsequent call to update the filename by calling the update_text endpoint. Ensure that this two-step update consistently preserves the metadata, or consider offering a dedicated endpoint for metadata+filename update. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. sdks/python/morphik/sync.py:1290
- Draft comment:
HTTP request header management: the code manually sets Content-Type header for JSON requests. Consider leveraging httpx's built-in JSON support (passing json=data) to avoid manual header management and potential mismatches. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. sdks/python/morphik/sync.py:1388
- Draft comment:
For file ingestion methods, consider adding more robust error handling around file I/O (e.g. file not found or permission errors) and ensuring that opened file objects are always closed, possibly using contextlib.ExitStack for multiple files. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. sdks/python/morphik/sync.py:1720
- Draft comment:
Using POST for listing documents and batch retrieval may not be RESTful. If this is intentional for performance reasons, ensure API documentation clearly explains these non-standard choices. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. sdks/python/morphik/sync.py:1385
- Draft comment:
Avoid duplicating folder and end-user scoping across multiple methods. Consider extracting a helper method that adds these parameters to every request payload, reducing repeated conditionals. - Reason this comment was not posted:
Comment looked like it was already resolved.
8. sdks/python/morphik/sync.py:1440
- Draft comment:
Repeated assignment ofdoc._client = self
after parsing document responses could be centralized in the parsing helper. This would ensure consistency and reduce boilerplate. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. sdks/python/morphik/sync.py:2502
- Draft comment:
Indelete_document_by_filename
, consider handling the case where the document isn’t found. Currently it proceeds to delete without checking if the parsed document is valid. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. sdks/python/morphik/sync.py:1290
- Draft comment:
Consider wrapping requests in try/except blocks or providing a custom exception for HTTP errors to offer friendlier error messages for client‐users. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
11. sdks/python/morphik/sync.py:521
- Draft comment:
Inlist_documents
, the folder is not passed as a parameter (end_user_id remains None). Verify whether this behavior is intentional or if folder scoping should be enforced. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. sdks/python/morphik/sync.py:1869
- Draft comment:
Several update methods use 'update_strategy' with only the 'add' option supported. Consider documenting limitations or planning extension for other strategies in a TODO. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
13. sdks/python/morphik/sync.py:1270
- Draft comment:
Overall, consider consolidating duplicate logic between async and sync clients to a common base (e.g. shared _request logic and helpers) to ease maintenance and avoid subtle discrepancies between implementations. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_kQhmXAWObclR3qqK
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mrge found 28 issues across 11 files. View them in mrge.io
log_prefix="Metadata filter evaluation response", | ||
log_suffix=f"for {document.external_id}", | ||
) | ||
return response.result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No validation of LLM result against expected range of values before returning
Returns: | ||
Self for method chaining | ||
""" | ||
self._filter_predicate = predicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filter() method doesn't validate that the predicate parameter is not empty or None before assigning it
end_user_id=self._end_user_id | ||
) | ||
|
||
response = await self._client._request("POST", "smart_query", data=request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execute() method directly returns the API response without parsing or validating it matches the expected List[str] return type
|
Summary by mrge
Added natural language filtering and sorting capabilities for document queries. This feature enables users to search and organize documents using simple English expressions instead of complex query syntax.
New Features
Refactors