Skip to content

Multiple workers #4210

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
May 15, 2019
Merged

Multiple workers #4210

merged 1 commit into from
May 15, 2019

Conversation

pragnagopa
Copy link
Member

@pragnagopa pragnagopa commented Mar 19, 2019

  • Refactored LanguageWorker layer to maintain a list of language workers
  • Process count is configurable by AppSetting: FUNCTIONS_WORKER_PROCESS_COUNT
  • Language workers can be created at Jobhost level or Webhost level
  • Function invocations are evenly distributed among language workers
  • If FUNCTIONS_WORKER_PROCESS_COUNT is set to greater than 1, language worker is created every 10 seconds. This is to avoid impact on cold start
  • Shuts down job host if language worker cannot start
  • Added more logging through out the language worker layer
  • Added Unit tests
  • Verified E2E with and without placeholders on private stamp.

Fixes #4195 Fixes #4194 Fixes #4193 Fixes #4161 Fixes #3939 Fixes #4026 Fixes #4326 Fixes #4392

Added by mhoeger: Fixes #4328

@pragnagopa pragnagopa force-pushed the buffermanager branch 4 times, most recently from 18a3b41 to 3997b85 Compare April 4, 2019 20:27
@pragnagopa pragnagopa force-pushed the buffermanager branch 2 times, most recently from 594f54e to d3fe2b5 Compare April 5, 2019 22:11
@pragnagopa pragnagopa marked this pull request as ready for review April 5, 2019 22:47
Copy link
Member

@brettsam brettsam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of some minor comments...

Copy link
Member

@fabiocav fabiocav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review. Sending an initial round of feedback as I didn't want to block. Will continue, but wanted to get this in front of you.

@pragnagopa
Copy link
Member Author

@maiqbal11 - For testing python functions

@maiqbal11
Copy link
Contributor

Adding notes from my testing. I'm using the app that was blocking on calls between functions in the same app from the following issue: Azure/azure-functions-python-worker#236. There is a queue trigger that triggers an http invocation.

  1. If I initially add a message to the queue before starting the host, I consistently receive the following error message:Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.QueueTriggerPython ---> System.InvalidOperationException: Did not find any initialized language workers
    This is likely related to the host trying to process the queue message before the worker process has come up. @pragnagopa mentioned that this shouldn't be the case since we have a timeout to account for the worker process starting before we actually process invocations. If I leave the function app alone for a few seconds after this error, the queue message gets picked up on its own and is processed. Subsequent queue messages are processed correctly - the http endpoint in the same app is called without blocking.

  2. I do not see the blocking behavior happen if I wait for the host to start up and then add messages to the queue. This is likely due to the latency of starting up a worker process initially.

There are still concerns that the blocking behavior will manifest if invocations land on the same process. For example, if the user configures two processes and the calls come in the following sequence:

(a) queue message 1 -> process 1
(b) queue message 2 -> process 2
(c). http trigger from queue message 1 -> process 1

(a) and (c) above would block each other since (a) is waiting for (c) to finish with the same thread in Python and vice versa. The interim mitigation here would be to use async constructs. @pragnagopa mentioned that there is some upcoming work to allow user's to control which functions are allocated to which process so functions in contention don't block each other.

@pragnagopa
Copy link
Member Author

pragnagopa commented May 6, 2019

Thanks @maiqbal11 for testing. I found the root cause. Invocations on any trigger need to wait for the Host to be ready. Updated WorkerLanguageInvoker to wait for scripthost to be ready before accepting invocations.

Copy link
Member

@fabiocav fabiocav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending a round of feedback... will follow up on some things in person

@@ -164,7 +165,15 @@ private async Task StartHostAsync(CancellationToken cancellationToken, int attem
if (!startupMode.HasFlag(JobHostStartupMode.HandlingError))
{
LastError = null;

var functionDispatcher = _host.Services.GetService<IFunctionDispatcher>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately clear what the purpose of this code is. Adding some comments will help. A clearer log message would also be helpful.
It also feels like this should be elsewhere, as it shouldn't be the responsibility of the Script Host service

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add comments. Need to wait for at least one language worker process to be initialized before setting ScriptHostState to running. Let me know if there is a better place add this change.

@maiqbal11
Copy link
Contributor

@pragnagopa, did a second round with the new code. Didn't see the issue where a queue message being present before host startup was causing an invocation before the language worker was up. However, even though I believe the message got handled correctly since I no longer see it on the queue, there were no longs to indicate this. The processing seemed to happen silently without any success or failure logs.

// Need to wait for atleast one language worker process to be initialized before setting ScriptHostState to running
// for non dotnet functions
if (_functionDispatcher != null && _functionDispatcher.State == FunctionDispatcherState.Initializing)
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiocav - Moved the logic that checks for function dispatcher start to WorkerLanguageInvoker. Can you please take a final look?

@pragnagopa pragnagopa force-pushed the buffermanager branch 2 times, most recently from 1c4cc8a to 13a6a7d Compare May 13, 2019 17:38
@pragnagopa pragnagopa force-pushed the buffermanager branch 2 times, most recently from 6252edc to 829cc9c Compare May 14, 2019 18:03
@pragnagopa
Copy link
Member Author

@fabiocav / @brettsam - finished e2e testing with the latest changes rebased on dev. Can you please take a final look? Thanks!

Copy link
Member

@fabiocav fabiocav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the last round in detail, but there should be no blockers based on the last iteration. We can address any additional comments later.

# for free to join this conversation on GitHub. Already have an account? # to comment