Suggestion: Provide Option to use asyncio ProcessPool instead of ThreadPool #403

ericdrobinson · 2019-04-30T15:54:40Z

One of the major use cases for Python is data science computation. This type of computation typically features lots of complex, heavy math that may take a very long time to compute. This type of computation is terrible for Python threads due to each thread sharing the GIL and blocking all other threads. This is even mentioned in the azure-functions-python-worker source code.

Unfortunately, Azure Functions in Python is currently built to use an asyncio Thread Pool. What this means for compute-heavy functions is that the default concurrency settings for function instances are relatively meaningless for compute-heavy applications. The HTTP trigger's maxConcurrentRequests value defaults to 100. The actual number of concurrent requests that can be handled with this approach? 1. The Queue Storage trigger's batchSize value defaults to 16. The actual number of concurrent requests that can be handled with this approach? 1.

Part of the attraction of Azure Functions is the simplicity of the programming model married with the promise of autoscale. That attraction disappears entirely when you get into the nitty-gritty and find that Azure Function instances are supposed to handle multiple concurrent requests but your function can only handle one at a time due to Python's intrinsic limitations (as employed by the azure-functions-python-worker). Each Function App instance is run on its own CPU [as I understand it] which has (at time of writing) at least 2 logical cores. With CPU-bound processing tasks, it would certainly be nice to utilize that potential and get at least a little concurrency back (say, 2 concurrent tasks).

Getting concurrency back (without resorting to subprocesses, which basically defeats the purpose of using the language in the first place) is somewhat possible with the asyncio multiprocessing module. While using this module does indeed work (and enables some controllable measure of concurrency), it has some serious issues with respect to testing. What's more, the default process start method uses fork under the hood, which has its own issues when run in an application that has already spawned threads. This issue is even called out in the Python documentation:

Note that safely forking a multithreaded process is problematic.

A suggested workaround for these issues is to use the spawn start method. Unfortunately, this doesn't work in Azure Functions with Python because we do not control the application startup process: by the time our function code is run it is too late to call set_start_method.

In summary, many desirable Python use cases are CPU-bound and cause significant issues/headaches when working with the current Azure Functions concurrency model. Changes (even if optional) to the default concurrency model used by the azure-functions-python-worker might significantly improve the ability of consumers to do more with each function instance; to better take advantage of Azure Functions platform and promise with Python.

The text was updated successfully, but these errors were encountered:

asavaritayal · 2019-05-01T17:12:27Z

/cc @pragnagopa

The work for this is already in progress. Closing as a duplicate of Azure/azure-functions-host#4194. PR - Azure/azure-functions-host#4210

asavaritayal added this to the Backlog milestone May 1, 2019

asavaritayal closed this as completed May 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Provide Option to use asyncio ProcessPool instead of ThreadPool #403

Suggestion: Provide Option to use asyncio ProcessPool instead of ThreadPool #403

ericdrobinson commented Apr 30, 2019

asavaritayal commented May 1, 2019

Suggestion: Provide Option to use asyncio ProcessPool instead of ThreadPool #403

Suggestion: Provide Option to use asyncio ProcessPool instead of ThreadPool #403

Comments

ericdrobinson commented Apr 30, 2019

asavaritayal commented May 1, 2019