Skip to content

How to increase timeout limit in batch transform jobs? #77

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
velociraptor111 opened this issue Nov 5, 2019 · 4 comments
Closed

How to increase timeout limit in batch transform jobs? #77

velociraptor111 opened this issue Nov 5, 2019 · 4 comments

Comments

@velociraptor111
Copy link

velociraptor111 commented Nov 5, 2019

I have configured my batch transform job to download a video file from S3, then processing it frame by frame.

I currently am getting this error due to timeout

2019-11-05 14:56:03,201 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 60001
2019-11-05 14:56:03,201 [ERROR] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Number or consecutive unsuccessful inference 1
2019-11-05 14:56:03,202 [ERROR] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker error
com.amazonaws.ml.mms.wlm.WorkerInitializationException: Backend worker did not respond in given time
#011at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:142)
#011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
#011at java.util.concurrent.FutureTask.run(FutureTask.java:266)
#011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
#011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
#011at java.lang.Thread.run(Thread.java:748)
2019-11-05 14:56:03,204 [INFO ] W-9000-model ACCESS_LOG - /169.254.255.130:52498 "POST /invocations HTTP/1.1" 500 60008

I have went over the doc multiple times and only got this https://docs.aws.amazon.com/sagemaker/latest/dg/API_runtime_InvokeEndpoint.html where it says for InvokeEndpoint the timeout is 60 Seconds. Also here https://docs.aws.amazon.com/en_pv/sagemaker/latest/dg/API_CreateTransformJob.html it doesn't specify any parameter to increase timeout for batch transform jobs.

But it doesn't tell me how to increase the timeout from parameter? Please advise on how to to achieve this.

EDIT
FYI, the approximate time I will need to process each input is around 2-3 minutes.

@velociraptor111
Copy link
Author

Hi @laurenyu any thoughts ?

@laurenyu
Copy link
Contributor

laurenyu commented Nov 6, 2019

I'm not sure if this is what you're looking for, and unfortunately I don't know off the top of my head if there's a limit for how high you can set the timeout, but you could try setting the environment variable SAGEMAKER_MODEL_SERVER_TIMEOUT through the env parameter for Transformer (docs).

Corresponding links of the Docker image code:

@velociraptor111
Copy link
Author

This is perfect thank you! Thanks for the code references as well, it's very helpful.

For someone who might be running into this problem, this is how I passed in the parameter

transformer = sagemaker_model.transformer(instance_count=1, 
                                          instance_type='ml.m4.xlarge', 
                                          output_path=batch_output,
                                          env = {'SAGEMAKER_MODEL_SERVER_TIMEOUT' : '3600' })

@Ridhamz-nd
Copy link

It might also be helpful to pass

model_client_config={ 'InvocationsTimeoutInSeconds': 3600 }

to the transformer.transform as I was getting a timeout error

Model server did not respond to /invocations request within 600 seconds

with just SAGEMAKER_MODEL_SERVER_TIMEOUT=3600

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants