Skip to content
This repository has been archived by the owner on Dec 16, 2024. It is now read-only.

Large function contexts can cause lambda creation timeouts #468

Closed
pgavlin opened this issue Apr 27, 2018 · 14 comments
Closed

Large function contexts can cause lambda creation timeouts #468

pgavlin opened this issue Apr 27, 2018 · 14 comments
Assignees
Milestone

Comments

@pgavlin
Copy link
Member

pgavlin commented Apr 27, 2018

See e.g. the relevant bits from these debug logs:

 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: Creating Lambda Function topic_onTestEvent-37b7813 with role arn:aws:iam::153052954103:role/topic_onTestEvent-2276031
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: Locking "aws_lambda_function"
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: Locked "aws_lambda_function"
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: Waiting for state to become: [success]
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: [aws-sdk-go] DEBUG: Request lambda/CreateFunction Details:
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: ---[ REQUEST POST-SIGN ]-----------------------------
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: POST /2015-03-31/functions HTTP/1.1
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: Host: lambda.us-east-1.amazonaws.com
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: User-Agent: aws-sdk-go/1.13.32 (go1.9; linux; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.11.5
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: Content-Length: 49209146
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: Authorization: AWS4-HMAC-SHA256 Credential=AKIAIUQHNEI6AHZNCGTQ/20180427/us-east-1/lambda/aws4_request, SignedHeaders=content-length;host;x-amz-date, Signature=368a6e1405d4878c7a10bb116489ef3b08ace3764cd787f1f8063b5cb37c8b13
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: X-Amz-Date: 20180427T000551Z
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running . debug: Accept-Encoding: gzip
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running... . debug: WaitForState timeout after 1m0s
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running... . debug: WaitForState starting 30s refresh grace period
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running. . debug: WaitForState exceeded refresh grace period
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running. . debug: Unlocking "aws_lambda_function"
 pulumi:pulumi:Stack stress-tester-stress-tester-testing running. . debug: Unlocked "aws_lambda_function"

(note that these lines are not consecutive in the logs; I've removed a bunch of intervening output)

Here we are sending a request that is nearly 50MB from Seattle to us-east-1. This fails to complete within the minute allowed by the TF AWS provider; manually editing the code to bump the timeout allows the create to succeed.

@pgavlin
Copy link
Member Author

pgavlin commented Apr 27, 2018

Note that this is exacerbated by the large amount of data we automatically add to the lambda contents.

@pgavlin pgavlin added this to the 0.14 milestone Apr 27, 2018
@pgavlin
Copy link
Member Author

pgavlin commented Apr 27, 2018

I've put this in M14 because it is a pretty critically bad user experience: AFAIK, there isn't anything that the end user can do to resolve this.

@lukehoban
Copy link
Contributor

Possibly related:

In both of those case though, folks are seeing this due to failures contacting certain regions - and symptoms seem related to VPNs or client machine DNS configuration.

@lukehoban
Copy link
Contributor

I'm curious whether this is really an issue with the 1 min timeout being too short, or whether this will fail no matter what the timeout is.

Note that it looks like the timeout is set to 10 mins already: https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_lambda_function.go#L394.

@pgavlin
Copy link
Member Author

pgavlin commented Apr 27, 2018

I wonder if we don't have the latest changes... I was looking at this timeout: https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_lambda_function.go#L367

@pgavlin
Copy link
Member Author

pgavlin commented Apr 27, 2018

Ah, we do have those changes, but the 9 minute timeout only kicks in if the original request failed due to throttling: https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_lambda_function.go#L390-L392

@pgavlin
Copy link
Member Author

pgavlin commented Apr 27, 2018

I'm curious whether this is really an issue with the 1 min timeout being too short, or whether this will fail no matter what the timeout is.

Experiments indicate that the timeout is certainly too short: upping the 1-minute timeout to 10 minutes allowed my pulumi update to succeed where it had not before.

@pgavlin
Copy link
Member Author

pgavlin commented Apr 28, 2018

Hit this again on a different machine on a different network, but with the same Pulumi program deploying into the same region.

@pgavlin
Copy link
Member Author

pgavlin commented Apr 28, 2018

Interestingly, that machine is now seeing a new (but related) error:

error: Plan apply failed: creating urn:pulumi:stress-tester-staging::stress-tester::cloud:function:Function$aws:serverless:Function$aws:lambda/function:Function::topic_onTestEvent: Error creating Lambda function: InvalidSignatureException: Signature expired: 20180428T055133Z is now earlier than 20180428T055216Z (20180428T055716Z - 5 min.)

As per aws/aws-sdk-js#527 (comment), it would seem that Lambda effectively limits the max upload time to 5 minutes.

@pgavlin
Copy link
Member Author

pgavlin commented Apr 28, 2018

See also the comments on https://github.com/apex/apex/issues/166, esp. at the bottom.

@pgavlin
Copy link
Member Author

pgavlin commented Apr 28, 2018

FWIW, it seems that Lambda limits the size of the zip file for a function to 50MB: https://docs.aws.amazon.com/lambda/latest/dg/limits.html

Manually examining the spilled assets for the relevant lambda indicates that we're using about two-thirds of that space (34MB) for what is a rather simple application. In addition, the zip file expands to 151MB of data, which is about 60% of the space allowed for a function's uncompressed contents.

Removing the node modules for @pulumi/*, grpc, and typescript--none of which are needed by this function at runtime--brings the size of the zip file down to 5.4MB and the uncompressed size down to 26MB.

@lukehoban lukehoban modified the milestones: 0.14, 0.15 May 25, 2018
@lukehoban
Copy link
Contributor

Both the overall slowness and the timeouts when targeting regions other than us-west-2 are significant issues - we will need to address this.

lukehoban pushed a commit to pulumi/pulumi-aws that referenced this issue Jun 3, 2018
By default, compute only the required package dependencies, and then include the transitive dependencies of these into the Lambda ZIP.

Also allow explicitly adding additional package dependencies (in addition to  existing support for additional file paths).

Part of pulumi/pulumi-cloud#468.
lukehoban added a commit to pulumi/pulumi-aws that referenced this issue Jun 5, 2018
By default, compute only the required package dependencies, and then include the transitive dependencies of these into the Lambda ZIP.

Also allow explicitly adding additional package dependencies (in addition to  existing support for additional file paths).

Part of pulumi/pulumi-cloud#468.
@joeduffy
Copy link
Member

joeduffy commented Jun 9, 2018

@pgavlin Is there anything remaining to do here pre-launch? I am assuming no, but if so, please bring back and add notes.

@joeduffy joeduffy modified the milestones: 0.15, 0.15.1 Jun 9, 2018
@lukehoban
Copy link
Contributor

lukehoban commented Jun 9, 2018

Yes - this is done - I was just waiting on #506 merging to close it out. But we can track that separately.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants