You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the major use cases for Python is data science computation. This type of computation typically features lots of complex, heavy math that may take a very long time to compute. This type of computation is terrible for Python threads due to each thread sharing the GIL and blocking all other threads. This is even mentioned in the azure-functions-python-workersource code.
Unfortunately, Azure Functions in Python is currently built to use an asyncio Thread Pool. What this means for compute-heavy functions is that the default concurrency settings for function instances are relatively meaningless for compute-heavy applications. The HTTP trigger's maxConcurrentRequests value defaults to 100. The actual number of concurrent requests that can be handled with this approach? 1. The Queue Storage trigger's batchSize value defaults to 16. The actual number of concurrent requests that can be handled with this approach? 1.
Part of the attraction of Azure Functions is the simplicity of the programming model married with the promise of autoscale. That attraction disappears entirely when you get into the nitty-gritty and find that Azure Function instances are supposed to handle multiple concurrent requests but your function can only handle one at a time due to Python's intrinsic limitations (as employed by the azure-functions-python-worker). Each Function App instance is run on its own CPU [as I understand it] which has (at time of writing) at least 2 logical cores. With CPU-bound processing tasks, it would certainly be nice to utilize that potential and get at least a little concurrency back (say, 2 concurrent tasks).
Getting concurrency back (without resorting to subprocesses, which basically defeats the purpose of using the language in the first place) is somewhat possible with the asynciomultiprocessing module. While using this module does indeed work (and enables some controllable measure of concurrency), it has some serious issues with respect to testing. What's more, the default process start method uses fork under the hood, which has its own issues when run in an application that has already spawned threads. This issue is even called out in the Python documentation:
Note that safely forking a multithreaded process is problematic.
A suggested workaround for these issues is to use the spawn start method. Unfortunately, this doesn't work in Azure Functions with Python because we do not control the application startup process: by the time our function code is run it is too late to call set_start_method.
In summary, many desirable Python use cases are CPU-bound and cause significant issues/headaches when working with the current Azure Functions concurrency model. Changes (even if optional) to the default concurrency model used by the azure-functions-python-worker might significantly improve the ability of consumers to do more with each function instance; to better take advantage of Azure Functions platform and promise with Python.
The text was updated successfully, but these errors were encountered:
One of the major use cases for Python is data science computation. This type of computation typically features lots of complex, heavy math that may take a very long time to compute. This type of computation is terrible for Python threads due to each thread sharing the GIL and blocking all other threads. This is even mentioned in the
azure-functions-python-worker
source code.Unfortunately, Azure Functions in Python is currently built to use an
asyncio
Thread Pool. What this means for compute-heavy functions is that the default concurrency settings for function instances are relatively meaningless for compute-heavy applications. The HTTP trigger'smaxConcurrentRequests
value defaults to 100. The actual number of concurrent requests that can be handled with this approach? 1. The Queue Storage trigger'sbatchSize
value defaults to 16. The actual number of concurrent requests that can be handled with this approach? 1.Part of the attraction of Azure Functions is the simplicity of the programming model married with the promise of autoscale. That attraction disappears entirely when you get into the nitty-gritty and find that Azure Function instances are supposed to handle multiple concurrent requests but your function can only handle one at a time due to Python's intrinsic limitations (as employed by the
azure-functions-python-worker
). Each Function App instance is run on its own CPU [as I understand it] which has (at time of writing) at least 2 logical cores. With CPU-bound processing tasks, it would certainly be nice to utilize that potential and get at least a little concurrency back (say, 2 concurrent tasks).Getting concurrency back (without resorting to subprocesses, which basically defeats the purpose of using the language in the first place) is somewhat possible with the
asyncio
multiprocessing
module. While using this module does indeed work (and enables some controllable measure of concurrency), it has some serious issues with respect to testing. What's more, the default process start method usesfork
under the hood, which has its own issues when run in an application that has already spawned threads. This issue is even called out in the Python documentation:A suggested workaround for these issues is to use the spawn start method. Unfortunately, this doesn't work in Azure Functions with Python because we do not control the application startup process: by the time our function code is run it is too late to call
set_start_method
.In summary, many desirable Python use cases are CPU-bound and cause significant issues/headaches when working with the current Azure Functions concurrency model. Changes (even if optional) to the default concurrency model used by the
azure-functions-python-worker
might significantly improve the ability of consumers to do more with each function instance; to better take advantage of Azure Functions platform and promise with Python.The text was updated successfully, but these errors were encountered: