Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

aws-pipes: CfnPipe does not wait for DDB Stream to be created before checking if it has permissions to read from it. #31211

Closed
1 task
7e11 opened this issue Aug 24, 2024 · 6 comments
Labels
@aws-cdk/aws-kinesis Related to Amazon Kinesis bug This issue is a bug. effort/medium Medium work item – several days of effort p3

Comments

@7e11
Copy link
Contributor

7e11 commented Aug 24, 2024

Describe the bug

The CfnPipe does not wait for DDB Stream to be created before checking if it has permissions to read from it. If the DDB Stream does not exist, then the CfnPipe resource fails to deploy with an error claiming it doesn't have permissions to read from the table. However, if the stream does exist, it deploys fine.

Because of this, this issue is intermittent, and has a higher likelihood of failing the stack the more tables + pipes you deploy in it.

See this discussion in #eventbridge-interest: https://amzn-aws.slack.com/archives/C017SEP3GCE/p1723573294515119

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

The stack should deploy successfully.

Current Behavior

Stack fails with an error on one of the CfnPipe constructs.

Resource handler returned message: "Resource of type 'AWS::Pipes::Pipe' with identifier 'Pipe3-9gTK3fUJ68Zz' did not stabilize. Status Reason is Input parameter is invalid from the request due to : Cannot access stream arn:aws:dynamodb:us-west-2:XXXXXXXXXXXX:table/EventbridgeIssuePocCdkStack-Table3D174A7C6-1F6BB9G0CKGH7/stream/2024-08-24T00:57:57.268. Please ensure the customer role can perform actions on source stream" (RequestToken: 7e6d975c-e919-6cdc-35f3-bf5015332e87, HandlerErrorCode: NotStabilized)

Reproduction Steps

I tried my best to create a minimal, consistently failing stack. This creates 10 tables, and 10 GSIs on each table. In my testing, having one of the pipes fail is common, which will cause the stack to fail.

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { aws_pipes as pipes } from 'aws-cdk-lib';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as logs from 'aws-cdk-lib/aws-logs';
import * as kms from 'aws-cdk-lib/aws-kms';

export class EventbridgeIssuePocCdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const logGroup = new logs.LogGroup(this, 'LogGroup', { removalPolicy: cdk.RemovalPolicy.DESTROY });

    const pipeRole = new iam.Role(this, 'LoggingRole', {
      assumedBy: new iam.ServicePrincipal('pipes.amazonaws.com'),
    });

    for (let step = 0; step < 10; step++) {
      const kmsKey = new kms.Key(this, 'Key' + step, {});

      const table = new dynamodb.Table(this, 'Table' + step, {
        partitionKey: {
          name: 'id',
          type: dynamodb.AttributeType.STRING,
        },
        stream: dynamodb.StreamViewType.NEW_AND_OLD_IMAGES,
        encryption: dynamodb.TableEncryption.CUSTOMER_MANAGED,
        encryptionKey: kmsKey,
        billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
        pointInTimeRecovery: true,
        removalPolicy: cdk.RemovalPolicy.DESTROY,
        timeToLiveAttribute: 'TTL',
      });

      // Create a bunch of secondary indicides to make the table take longer to deploy
      for (let j = 0; j < 10; j++) {
        table.addGlobalSecondaryIndex({
          indexName: 'Index'+j,
          partitionKey: { name: 'pk'+j, type: dynamodb.AttributeType.STRING },
          sortKey: { name: 'sk'+j, type: dynamodb.AttributeType.STRING },
        })
      }

      if (table.tableStreamArn == null) {
        throw new Error("tableStreamArn is null")
      }

      // setup permissions
      table.grantStreamRead(pipeRole);
      logGroup.grantWrite(pipeRole);
      kmsKey.grantDecrypt(pipeRole);

      // Creating the pipe in the same deployment as the Table causes an error.
      // You have to do sequenced deployments. First with the Table, and then afterwards with the pipes.
      const pipe = new pipes.CfnPipe(this, 'Pipe' + step, {
        roleArn: pipeRole.roleArn,
        source: table.tableStreamArn,
        target: logGroup.logGroupArn,
        sourceParameters: {
          dynamoDbStreamParameters: {
            startingPosition: 'LATEST',
            batchSize: 1,
            onPartialBatchItemFailure: 'AUTOMATIC_BISECT',
          }
        },
      });
    }
  }
}

To see this example of this being intermittent, reduce steps to 1, so only a single DDB table is created. Odds are good that the associated pipe will succeed in deploying.

Possible Solution

There are a couple of workarounds which should work for this.

  1. Doing two deployments. One with the tables and no pipes, and later doing a deployment with the pipes.
  • No good ways to automate sequenced deployments.
  1. Use a custom CDK construct which is able to add a wait of a couple minutes, to prevent the pipe from deploying too soon.
  • Of course, this will mean that every deployment of that stack will take +2 minutes for no real reason.

As Oliver mentioned in the attached slack thread, there are a couple of ways this could be fixed for real.

  • Have DDB vend an API which reports on stream status. right now DescribeTable doesn't provide enough info. (See slack thread)
  • Have the CfnPipe construct retry a few times.

Additional Information/Context

No response

CDK CLI Version

3.154.1 (build febce9d)

Framework Version

3.154.1

Node.js Version

v16.7.0

OS

macOS Sonoma Version 14.6.1

Language

TypeScript

Language Version

TypeScript (5.5.3)

Other information

No response

@7e11 7e11 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 24, 2024
@github-actions github-actions bot added the @aws-cdk/aws-kinesis Related to Amazon Kinesis label Aug 24, 2024
@pahud pahud self-assigned this Aug 26, 2024
@pahud pahud added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Aug 26, 2024
@pahud
Copy link
Contributor

pahud commented Aug 26, 2024

Looks like table.tableStreamArn is available when table returns but the stream is actually not available yet. This would be a bug or something CFN needs to improve.

I will create an internal ticket with relevant team for clarifying.

@pahud
Copy link
Contributor

pahud commented Aug 26, 2024

internal tracking: V1497473646

@pahud pahud added p2 and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels Aug 26, 2024
@pahud pahud removed their assignment Aug 26, 2024
@pahud pahud added the effort/medium Medium work item – several days of effort label Aug 26, 2024
@pahud
Copy link
Contributor

pahud commented Aug 27, 2024

We need to explicitly ensure the Pipe depends on the role as well as the default policy

 // ensure the pipe depends on the logging role default policy as well as the role
    const cfndefaultPolicy = pipeRole.node.tryFindChild('DefaultPolicy') as iam.CfnPolicy
    pipe.node.addDependency(cfndefaultPolicy, pipeRole)

On cdk synth you should see two explicit dependencies in the pipe resource:

Type: AWS::Pipes::Pipe
    Properties:
      RoleArn:
        Fn::GetAtt:
          - LoggingRole0BCFB29B
          - Arn
      Source:
        Fn::GetAtt:
          - TableCD117FA1
          - StreamArn
      SourceParameters:
        DynamoDBStreamParameters:
          BatchSize: 1
          OnPartialBatchItemFailure: AUTOMATIC_BISECT
          StartingPosition: LATEST
      Target:
        Fn::GetAtt:
          - LogGroupF5B46931
          - Arn
    DependsOn:
      - LoggingRoleDefaultPolicy7C1A4368
      - LoggingRole0BCFB29B

@pahud pahud added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p3 and removed p2 labels Aug 27, 2024
@7e11
Copy link
Contributor Author

7e11 commented Aug 27, 2024

I can confirm that this worked. The problem was that the CfnPipe was checking if it had permissions on the source before the role policy was created. Because this is an L1 construct, we need to define this dependency explicitly.

It's worth noting that there is an L2 construct for CfnPipe in alpha, which may run into this issue -- however it doesn't have DDB Stream sources implemented yet: https://github.com/aws/aws-cdk/tree/main/packages/%40aws-cdk/aws-pipes-sources-alpha/lib

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Aug 28, 2024
@pahud
Copy link
Contributor

pahud commented Aug 28, 2024

Resolving this issue now. When the DDB stream source is implemented in the Pipe L2 construct, it should take care of the dependence then.

@pahud pahud closed this as completed Aug 28, 2024
Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2024
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
@aws-cdk/aws-kinesis Related to Amazon Kinesis bug This issue is a bug. effort/medium Medium work item – several days of effort p3
Projects
None yet
Development

No branches or pull requests

2 participants