Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

(stepfunctions-tasks): AthenaStartQueryExecution construct creates insufficient IAM PolicyDocument #25875

Closed
svbfromnl opened this issue Jun 6, 2023 · 5 comments · Fixed by #25911
Labels
bug This issue is a bug. effort/small Small work item – less than a day of effort good first issue Related to contributions. See CONTRIBUTING.md p1

Comments

@svbfromnl
Copy link

svbfromnl commented Jun 6, 2023

Describe the bug

When synthesizing the following CDK/Python snippet, the resulting IAM configuration does not allow Athena to write the results to the designated output_bucket location:

from aws_cdk import aws_stepfunctions as sfn
from aws_cdk import aws_stepfunctions_tasks as sfn_tasks

        start_query_execution_job = sfn_tasks.AthenaStartQueryExecution(
            self,
            "Start Athena Query",
            query_string=query,
            integration_pattern=sfn.IntegrationPattern.RUN_JOB,
            query_execution_context=sfn_tasks.QueryExecutionContext(database_name=database),
            result_configuration=sfn_tasks.ResultConfiguration(
                encryption_configuration=sfn_tasks.EncryptionConfiguration(
                    encryption_option=sfn_tasks.EncryptionOption.S3_MANAGED,
                ),
                output_location=s3.Location(bucket_name=output_bucket, object_key="results"),
            ),
        )

Expected Behavior

I expect the policy to end in a dash and an asterisk, to ensure that the required Actions can take place on the resources inside the folder:

   "Action": [
    "s3:AbortMultipartUpload",
    "s3:ListBucketMultipartUploads",
    "s3:ListMultipartUploadParts",
    "s3:PutObject"
   ],
   "Effect": "Allow",
   "Resource": "arn:aws:s3:::S3_BUCKET_NAME/results/*"

Current Behavior

The current Policy does not end in an asterisk, causing permission issues when output is written to the location:

    "PolicyDocument": {
     "Statement": [
      {
       "Action": [
        "athena:getDataCatalog",
        "athena:getQueryExecution",
        "athena:startQueryExecution"
       ],
       "Effect": "Allow",
       "Resource": [
        "arn:aws:athena:us-east-1:317XXXXXX577:datacatalog/AwsDataCatalog",
        "arn:aws:athena:us-east-1:317XXXXXX577:workgroup/primary"
       ]
      },
      {
       "Action": [
        "athena:getQueryResults",
        "lakeformation:GetDataAccess",
        "s3:CreateBucket",
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket"
       ],
       "Effect": "Allow",
       "Resource": "*"
      },
      {
       "Action": [
        "s3:AbortMultipartUpload",
        "s3:ListBucketMultipartUploads",
        "s3:ListMultipartUploadParts",
        "s3:PutObject"
       ],
       "Effect": "Allow",
       "Resource": "arn:aws:s3:::S3_BUCKET_NAME/results"
      },

The result of the current behavior is that Athena is not able to execute the query, and errors out with the following:

Access denied when writing output to url: s3://S3_BUCKET_NAME/results/2d282a78-6b8b-48c3-aaf2-819e867ca3dc.csv . Please ensure you are allowed to access the S3 bucket. If specifying an expected bucket owner, confirm the bucket is owned by the expected account. If you are encrypting query results with KMS key, please ensure you are allowed to access your KMS key

Reproduction Steps

from aws_cdk import aws_stepfunctions as sfn
from aws_cdk import aws_stepfunctions_tasks as sfn_tasks

        start_query_execution_job = sfn_tasks.AthenaStartQueryExecution(
            self,
            "Start Athena Query",
            query_string=query,
            integration_pattern=sfn.IntegrationPattern.RUN_JOB,
            query_execution_context=sfn_tasks.QueryExecutionContext(database_name=database),
            result_configuration=sfn_tasks.ResultConfiguration(
                encryption_configuration=sfn_tasks.EncryptionConfiguration(
                    encryption_option=sfn_tasks.EncryptionOption.S3_MANAGED,
                ),
                output_location=s3.Location(bucket_name=output_bucket, object_key="results"),
            ),
        )

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.82.0 (build 3a8648a)

Framework Version

No response

Node.js Version

Node.js v20.1.0

OS

macOS Ventura 13.4

Language

Python

Language Version

Python 3.11.0

Other information

Changing the object_key variable to "results/*" causes the correct IAM Policy to be created, but the addition of the /* causes the actual location of the files in the S3 bucket to become:
s3://S3_BUCKET_NAME/results/*/b7f41696-c586-49ec-a915-51e7e1379110.csv

@svbfromnl svbfromnl added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 6, 2023
@github-actions github-actions bot added the @aws-cdk/aws-iam Related to AWS Identity and Access Management label Jun 6, 2023
@peterwoodworth peterwoodworth changed the title CDK/Python: AthenaStartQueryExecution construct creates insufficient IAM PolicyDocument (stepfunctions-tasks): AthenaStartQueryExecution construct creates insufficient IAM PolicyDocument Jun 7, 2023
@peterwoodworth peterwoodworth removed the @aws-cdk/aws-iam Related to AWS Identity and Access Management label Jun 7, 2023
@peterwoodworth peterwoodworth added good first issue Related to contributions. See CONTRIBUTING.md p1 effort/small Small work item – less than a day of effort and removed needs-triage This issue or PR still needs to be triaged. @aws-cdk/aws-stepfunctions-tasks labels Jun 7, 2023
@peterwoodworth
Copy link
Contributor

Hm, it looks like it should be adding an asterisk no matter what:

resources: [this.props.resultConfiguration?.outputLocation?.bucketName ? `arn:aws:s3:::${this.props.resultConfiguration?.outputLocation?.bucketName}/${this.props.resultConfiguration?.outputLocation?.objectKey}/*` : '*'], // Need S3 location where data is stored or Athena throws an Unable to verify/create output bucket https://docs.aws.amazon.com/athena/latest/ug/security-iam-athena.html

How did you define output_bucket in your code?

@peterwoodworth peterwoodworth added p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed good first issue Related to contributions. See CONTRIBUTING.md p1 labels Jun 7, 2023
@svbfromnl
Copy link
Author

svbfromnl commented Jun 7, 2023

How did you define output_bucket in your code?

output_bucket="S3_BUCKET_NAME"
No arn, no s3:// prefix, no slashes. Just the name of the bucket.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jun 7, 2023
@jamescorrenti
Copy link

@peterwoodworth - When I view the compiled TS for line 109, this is what comes out - I don't see it appending an asterisk after objectKey

policyStatements.push(new iam.PolicyStatement({
            actions: ['s3:AbortMultipartUpload',
                's3:ListBucketMultipartUploads',
                's3:ListMultipartUploadParts',
                's3:PutObject'],
            resources: [
                this.props.resultConfiguration?.outputLocation?.bucketName
                    ? cdk.Stack.of(this).formatArn({
                        region: '',
                        account: '',
                        service: 's3',
                        resource: this.props.resultConfiguration?.outputLocation?.bucketName,
                        resourceName: this.props.resultConfiguration?.outputLocation?.objectKey,
                    })
                    : '*',
            ],

@peterwoodworth
Copy link
Contributor

peterwoodworth commented Jun 8, 2023

Yeah @jamescorrenti I'm seeing the exact same thing in the compilation. It's because the link I provided earlier isn't from a recent commit, I must have accidentally been looking at v1.

The code was updated as of 2.45, this PR fixed an issue where these arns didn't account for partition. However, we didn't have any tests for these policies before this PR, so this regression with constructing the arn using formatArn() and not accounting for the final asterisk was not caught

Here is the current code, should only need to adjust it such that the resourceName is appended with an asterisk:

policyStatements.push(
new iam.PolicyStatement({
actions: ['s3:AbortMultipartUpload',
's3:ListBucketMultipartUploads',
's3:ListMultipartUploadParts',
's3:PutObject'],
resources: [
this.props.resultConfiguration?.outputLocation?.bucketName
? cdk.Stack.of(this).formatArn({
// S3 Bucket names are globally unique in a partition,
// and so their ARNs have empty region and account components
region: '',
account: '',
service: 's3',
resource: this.props.resultConfiguration?.outputLocation?.bucketName,
resourceName: this.props.resultConfiguration?.outputLocation?.objectKey,
})
: '*',
],
}),
);

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug This issue is a bug. effort/small Small work item – less than a day of effort good first issue Related to contributions. See CONTRIBUTING.md p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants