You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using v0.25.5 of unstructured-client on vscode, on processing PDFs of more than 1 page with "hi_res", I consistently receive INFO: Failed to process a request due to API server error with status code 504. and consequently:
importosfromunstructured_clientimportUnstructuredClientfromunstructured_client.modelsimportsharedfromunstructured_client.models.errorsimportSDKErroros.environ['UNSTRUCTURED_API_KEY'] ="<MY_API_KI>"os.environ['UNSTRUCTURED_API_URL'] ="<MY_API_URL>"client_obj=UnstructuredClient(
api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"),
server_url=os.getenv("UNSTRUCTURED_API_URL"),
)
filename="./data/kenwood_en.pdf"file=open(filename, "rb")
req=shared.PartitionParameters(
# Note that this currently only supports a single filefiles=shared.Files(
content=file.read(),
file_name=filename,
),
chunking_strategy="by_title",
max_characters=1024,
split_pdf_page=True,
split_pdf_allow_failed=True
)
try:
res=client_obj.general.partition(request=req)
print(res.elements[0])
exceptSDKErrorase:
print(e)
Expected behavior
After 2 minutes, it will always throw the error:
INFO: Preparing to split document for partition.
INFO: Starting page number set to 1
INFO: Allow failed set to 1
INFO: Concurrency level set to 5
INFO: Splitting pages 1 to 40 (40 total)
INFO: Determined optimal split size of 8 pages.
INFO: Partitioning 5 files with 8 page(s) each.
INFO: Partitioning set #1 (pages 1-8).
INFO: Partitioning set #2 (pages 9-16).
INFO: Partitioning set #3 (pages 17-24).
INFO: Partitioning set #4 (pages 25-32).
INFO: Partitioning set #5 (pages 33-40).
INFO: HTTP Request: POST <MY_API_URL> "HTTP/1.1 504 Gateway Time-out"
ERROR: Failed to send request for page 25
INFO: HTTP Request: POST <MY_API_URL> "HTTP/1.1 504 Gateway Time-out"
ERROR: Failed to send request for page 17
INFO: HTTP Request: POST <MY_API_URL> "HTTP/1.1 504 Gateway Time-out"
ERROR: Failed to send request for page 9
INFO: HTTP Request: POST<MY_API_URL> "HTTP/1.1 504 Gateway Time-out"
ERROR: Failed to send request for page 1
WARNING: Failed to partition set #1, its elements will be omitted in the final result.
WARNING: Failed to partition set #2, its elements will be omitted in the final result.
WARNING: Failed to partition set #3, its elements will be omitted in the final result.
WARNING: Failed to partition set #4, its elements will be omitted in the final result.
WARNING: Failed to partition set #5, its elements will be omitted in the final result.
INFO: Failed to process a request due to API server error with status code 504. Attempting retry number 1 after sleep.
INFO: Server message - <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>
And then it will go about the retry strategy, which I presume is the one defined in general.py.
This loop of 504s continues again and again.
I have tried adjusting the RetryConfig in my Client and general.Partition, but can't seem to make it make a difference to when and how my program fails.
Environment Info
I am running this in a Jupyter notebook in VSCode, within a venv.
Additional Info
The pdf I used to reproduce this example is here
Would anyone have a solution, or could help guide me as to whether this is a me issue or a bug?
The text was updated successfully, but these errors were encountered:
Describe the bug
When using v0.25.5 of unstructured-client on vscode, on processing PDFs of more than 1 page with "hi_res", I consistently receive
INFO: Failed to process a request due to API server error with status code 504.
and consequently:To Reproduce
Expected behavior
After 2 minutes, it will always throw the error:
And then it will go about the retry strategy, which I presume is the one defined in general.py.
This loop of 504s continues again and again.
I have tried adjusting the RetryConfig in my Client and general.Partition, but can't seem to make it make a difference to when and how my program fails.
Environment Info
I am running this in a Jupyter notebook in VSCode, within a venv.
Additional Info
The pdf I used to reproduce this example is here
Would anyone have a solution, or could help guide me as to whether this is a me issue or a bug?
The text was updated successfully, but these errors were encountered: