Skip to content

Suggestion: Provide Option to use asyncio ProcessPool instead of ThreadPool #403

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
ericdrobinson opened this issue Apr 30, 2019 · 1 comment
Milestone

Comments

@ericdrobinson
Copy link

One of the major use cases for Python is data science computation. This type of computation typically features lots of complex, heavy math that may take a very long time to compute. This type of computation is terrible for Python threads due to each thread sharing the GIL and blocking all other threads. This is even mentioned in the azure-functions-python-worker source code.

Unfortunately, Azure Functions in Python is currently built to use an asyncio Thread Pool. What this means for compute-heavy functions is that the default concurrency settings for function instances are relatively meaningless for compute-heavy applications. The HTTP trigger's maxConcurrentRequests value defaults to 100. The actual number of concurrent requests that can be handled with this approach? 1. The Queue Storage trigger's batchSize value defaults to 16. The actual number of concurrent requests that can be handled with this approach? 1.

Part of the attraction of Azure Functions is the simplicity of the programming model married with the promise of autoscale. That attraction disappears entirely when you get into the nitty-gritty and find that Azure Function instances are supposed to handle multiple concurrent requests but your function can only handle one at a time due to Python's intrinsic limitations (as employed by the azure-functions-python-worker). Each Function App instance is run on its own CPU [as I understand it] which has (at time of writing) at least 2 logical cores. With CPU-bound processing tasks, it would certainly be nice to utilize that potential and get at least a little concurrency back (say, 2 concurrent tasks).

Getting concurrency back (without resorting to subprocesses, which basically defeats the purpose of using the language in the first place) is somewhat possible with the asyncio multiprocessing module. While using this module does indeed work (and enables some controllable measure of concurrency), it has some serious issues with respect to testing. What's more, the default process start method uses fork under the hood, which has its own issues when run in an application that has already spawned threads. This issue is even called out in the Python documentation:

Note that safely forking a multithreaded process is problematic.

A suggested workaround for these issues is to use the spawn start method. Unfortunately, this doesn't work in Azure Functions with Python because we do not control the application startup process: by the time our function code is run it is too late to call set_start_method.

In summary, many desirable Python use cases are CPU-bound and cause significant issues/headaches when working with the current Azure Functions concurrency model. Changes (even if optional) to the default concurrency model used by the azure-functions-python-worker might significantly improve the ability of consumers to do more with each function instance; to better take advantage of Azure Functions platform and promise with Python.

@asavaritayal asavaritayal added this to the Backlog milestone May 1, 2019
@asavaritayal
Copy link
Contributor

/cc @pragnagopa

The work for this is already in progress. Closing as a duplicate of Azure/azure-functions-host#4194. PR - Azure/azure-functions-host#4210

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants