Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix(lambda): configuring log retention fails on 70+ Lambdas #31340

Merged
merged 6 commits into from
Sep 25, 2024

Conversation

rix0rrr
Copy link
Contributor

@rix0rrr rix0rrr commented Sep 6, 2024

When the Log Retention Lambda runs massively parallel (on 70+ Lambdas at the same time), it can run into throttling problems and fail.

Raise the retry count and delays:

  • Raise the default amount of retries from 5 -> 10
  • Raise the sleep base from 100ms to 1s.
  • Change the sleep calculation to apply the 60s limit after jitter instead of before (previously, we would take a fraction of 60s; now we're taking a fraction of the accumulated wait time, and after calculating that limit it to 60s).

Fixes #31338.

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

When the Log Retention Lambda runs massively parallel (on 70+ Lambdas
at the same time), it can run into throttling problems and fail.

Raise the retry count and delays:

- Raise the default amount of retries from 5 -> 10
- Raise the sleep base from 100ms to 1s.
- Change the sleep calculation to apply the 10s limit *after* jitter instead
  of before (previously, we would take a fraction of 10s; now we're
  taking a fraction of the accumulated wait time, and after calculating
  that limit it to 10s).

Fixes #31338.
@github-actions github-actions bot added bug This issue is a bug. p2 labels Sep 6, 2024
@aws-cdk-automation aws-cdk-automation requested a review from a team September 6, 2024 11:48
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Sep 6, 2024
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

mrgrain
mrgrain previously requested changes Sep 6, 2024
Copy link
Contributor

@mrgrain mrgrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR not completed. Will review again once complete.

@rix0rrr rix0rrr added the pr-linter/exempt-integ-test The PR linter will not require integ test changes label Sep 6, 2024
@aws-cdk-automation aws-cdk-automation dismissed their stale review September 6, 2024 14:25

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@rix0rrr rix0rrr requested a review from mrgrain September 6, 2024 14:37
delayBase: number = 100,
delayCap = 10 * 1000, // 10s
delayBase: number = 1_000,
delayCap = 60_000, // 60s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: PR description says you cap at 10s, but this says 60s.

@github-actions github-actions bot added the effort/medium Medium work item – several days of effort label Sep 25, 2024
Copy link
Contributor

mergify bot commented Sep 25, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: d603f7b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit a2d42d2 into main Sep 25, 2024
12 checks passed
@mergify mergify bot deleted the huijbers/raise-timeout branch September 25, 2024 10:16
Copy link
Contributor

mergify bot commented Sep 25, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2024
@rix0rrr rix0rrr self-assigned this Sep 25, 2024
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
bug This issue is a bug. contribution/core This is a PR that came from AWS. effort/medium Medium work item – several days of effort p2 pr-linter/exempt-integ-test The PR linter will not require integ test changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aws-lambda: Log retention gives rate exceeded error
4 participants