-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Serve] incorrect/nondeterministic __CALL__ OK [time]
logs with blocking concurrent requests
#48903
Comments
13 tasks
jecsand838
pushed a commit
to jecsand838/ray
that referenced
this issue
Dec 4, 2024
## Why are these changes needed? Adds a feature flag to run sync user-defined methods in a threadpool by default. This matches the existing behavior when using a FastAPI ingress. This should address a lot of user confusion and make it easier to write performant code by default. For example, just sticking a torch model call in a sync method will now provide reasonable performance out of the box. However, there may be some existing user code that is not thread safe, so we need to do a gentle migration. This PR introduces the behavior behind a feature flag and warns users about the upcoming change and how to opt into the new behavior or maintain existing behavior once it does (just adding `async def` will do it). I've opted to set the max thread pool size to `max_ongoing_requests`, which seems like a reasonable policy. If needed we can add a user-facing API for this in the future. TODO before merging: - [x] Get it working for sync generators. - [x] Add warning for default change (people can keep behavior by changing to async def). - [x] Add/update UserCallableWrapper tests. - [x] Add/update some integration tests (verify that request context is set correctly!). - [x] Set maximum thread pool size. ## Related issue number Closes ray-project#44354 Closes ray-project#44403 Closes ray-project#48903 --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Connor Sanders <connor@elastiflow.com>
dentiny
pushed a commit
to dentiny/ray
that referenced
this issue
Dec 7, 2024
## Why are these changes needed? Adds a feature flag to run sync user-defined methods in a threadpool by default. This matches the existing behavior when using a FastAPI ingress. This should address a lot of user confusion and make it easier to write performant code by default. For example, just sticking a torch model call in a sync method will now provide reasonable performance out of the box. However, there may be some existing user code that is not thread safe, so we need to do a gentle migration. This PR introduces the behavior behind a feature flag and warns users about the upcoming change and how to opt into the new behavior or maintain existing behavior once it does (just adding `async def` will do it). I've opted to set the max thread pool size to `max_ongoing_requests`, which seems like a reasonable policy. If needed we can add a user-facing API for this in the future. TODO before merging: - [x] Get it working for sync generators. - [x] Add warning for default change (people can keep behavior by changing to async def). - [x] Add/update UserCallableWrapper tests. - [x] Add/update some integration tests (verify that request context is set correctly!). - [x] Set maximum thread pool size. ## Related issue number Closes ray-project#44354 Closes ray-project#44403 Closes ray-project#48903 --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: hjiang <dentinyhao@gmail.com>
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
What happened + What you expected to happen
Deployment logs of the form
__CALL__ OK [time]
can display the incorrect execution time if the Deployment has synchronous code that blocks the mainasyncio
event loop. A simple example is the following,In this toy example, I send 5 concurrent requests to the the
BlockingDeployment
, which callstime.sleep(1.)
and blocks the event loop. I'd expect requests to queue up and take 1, 2, 3, 4, 5s. I see the following outputThe
">>>> GET"
logs show the requests are returning in the correct time, but clearly the__CALL__
logs are mostly wrong: the first correctly says ~1s but all successive logs say ~5s. I'd expect something likeMoreover, this output is nondeterministic and may change on successive runs,
Versions / Dependencies
Reproduction script
[see above.]
Issue Severity
Low: It annoys or frustrates me.
The text was updated successfully, but these errors were encountered: