Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Clarification on minWorkers and maxWorkers parameters #3339

Open
krzwaraksa opened this issue Oct 3, 2024 · 0 comments
Open

Clarification on minWorkers and maxWorkers parameters #3339

krzwaraksa opened this issue Oct 3, 2024 · 0 comments

Comments

@krzwaraksa
Copy link

📚 The doc issue

I have some questions related to model parameters:

  1. I know there is no autoscaling in Torchserve, and looking at code, models will scale minWorkers number of workers on startup. maxWorkers seems to be only used when downscaling a model, meaning if currentWorkers > maxWorkers, it will kill currentWorkers - maxWorkers workers (WorkloadManager.java:151). Given that we'll only scale/downscale number of workers on scaleWorkers API call, is there any practical use case of setting minWorkers != maxWorkers? For example in examples/cloud_storage_stream_inference/config.properties minWorkers is set to 10 and maxWorkers to 1000, when do we want that?
  2. In docs/getting_started.md it reads: If you specify model(s) when you run TorchServe, it automatically scales backend workers to the number equal to available vCPUs (if you run on a CPU instance) or to the number of available GPUs (if you run on a GPU instance).. I can't find any evidence of this behavior in the code, could somebody clarify how if this statement is true and how does it work?

Thank you!

Suggest a potential alternative/fix

No response

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant