Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

apigateway: caching remains enabled after removal, potentially returning wrong data #32342

Closed
apparentorder opened this issue Dec 1, 2024 · 2 comments
Labels
@aws-cdk/aws-apigateway Related to Amazon API Gateway bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@apparentorder
Copy link

apparentorder commented Dec 1, 2024

Describe the bug

Removing the cache_enabled and cache_cluster_enabled settings from stage deployment options does not disable caching, as is the default behavior, despite cdk diff output suggesting it would be removed.

Side note: When trying to disabe caching entirely, the caching keys configuration might be removed at the same time. In this case, due to this bug, caching remains enabled but the caching keys are not respected anymore, which leads to caching/delivery of randomly wrong data to clients, causing havoc and mayhem.

Expected Behavior

caching is disabled and the cache cluster gets removed

Current Behavior

caching remains enabled and the cache cluster is still there, generating costs

Reproduction Steps

CDK script:

from aws_cdk import (
    Duration,
    Stack,
    aws_apigateway,
)
from constructs import Construct
import os

class ApiGwCacheStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        if os.environ.get("CACHE") == "yes":
            api = aws_apigateway.RestApi(self, "cdk-bug-api",
                rest_api_name="cdk-bug-api",
                description="This is my sample API for demonstration",
                deploy_options = {
                    "caching_enabled": True,
                    "cache_cluster_enabled": True,
                    "cache_ttl": Duration.minutes(60),
                    "throttling_burst_limit": 5,
                    "throttling_rate_limit": 10,

                }
            )
        else:
            api = aws_apigateway.RestApi(self, "cdk-bug-api",
                rest_api_name="cdk-bug-api",
                description="This is my sample API for demonstration",
                deploy_options = {
                    "throttling_burst_limit": 5,
                    "throttling_rate_limit": 10,

                }
            )

        api.root.add_resource("hello").add_method("GET") # dummy
        CfnOutput(self, "ApiGatewayId", value=api.rest_api_id)

Deploy with caching enabled:

$ export CACHE="yes"; cdk diff && cdk deploy --progress events
[...] Outputs:
ApiGwCacheStack.ApiGatewayId = 4hllumf8bk

Verify caching is enabled (as expected):

$ aws apigateway get-stage --rest-api-id 4hllumf8bk --stage-name prod --query 'methodSettings.*.cachingEnabled' --output text
True

Remove caching config from CDK definition:

$ export CACHE="no"; cdk diff && cdk deploy --progress events
[...]
[~] AWS::ApiGateway::Stage cdk-bug-api/DeploymentStage.prod cdkbugapiDeploymentStageprod3E594040
 ├─ [-] CacheClusterEnabled
 │   └─ true
 ├─ [-] CacheClusterSize
 │   └─ 0.5
 └─ [~] MethodSettings
     └─ @@ -1,7 +1,5 @@
        [ ] [
        [ ]   {
        [-]     "CacheTtlInSeconds": 3600,
        [-]     "CachingEnabled": true,
        [ ]     "DataTraceEnabled": false,
        [ ]     "HttpMethod": "*",
[...]
ApiGwCacheStack | 1/3 | 2:19:40 PM | UPDATE_COMPLETE      | AWS::ApiGateway::Stage      | cdk-bug-api/DeploymentStage.prod (cdkbugapiDeploymentStageprod3E594040) 

Verify that caching still enabled (bad):

$ aws apigateway get-stage --rest-api-id 4hllumf8bk --stage-name prod --query '[cacheClusterEnabled, methodSettings.*.cachingEnabled]' --output text
True
True

Additional Information/Context

This behavior may be affected by the other deployment options. For example, I was not able to reproduce the bug for caching_enabled until I have added cache_ttl to the options. But for cache_cluster_enabled, it was reproducible even with empty deployment options.

The bug doesn't seem to be related to the project language: This test case is Python, but the bug originally hit in production with Typescript.

Workaround: Explicitly setting those values to False works as expected.

CDK CLI Version

2.171.1 (build a95560c)

Node.js Version

Node.js v20.11.0

OS

AL2023

Language

Python

Language Version

3.9.16

@apparentorder apparentorder added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 1, 2024
@github-actions github-actions bot added the @aws-cdk/aws-apigateway Related to Amazon API Gateway label Dec 1, 2024
@ashishdhingra ashishdhingra self-assigned this Dec 2, 2024
@ashishdhingra ashishdhingra added p2 investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Dec 2, 2024
@ashishdhingra
Copy link
Contributor

@apparentorder Good morning. Thanks for reporting the issue. I noticed the below Note at Cache settings for REST APIs in API Gateway:

Note
Creating or deleting a cache takes about 4 minutes for API Gateway to complete.

When a cache is created, the Cache cluster value changes from Create in progress to Active. When cache deletion is completed, the Cache cluster value changes from Delete in progress to Inactive.

When you turn on method-level caching for all methods on your stage, the Default method-level caching value changes to Active. If you turn off method-level caching for all methods on your stage, the Default method-level caching value changes to Inactive. If you have an existing setting for a method-level cache, changing the status of the cache doesn't affect that setting.

Notice the below note:

  • Creating or deleting a cache takes about 4 minutes for API Gateway to complete.
  • If you have an existing setting for a method-level cache, changing the status of the cache doesn't affect that setting.

Could you wait for 4 minutes (probably more) to verify if caching still enabled via AWS CLI after CDK deployment? Also, check if you have an existing setting for a method-level cache.

I tried to reproduce the issue using the below CDK code (in TypeScript), first enabling cache and then disabling it with re-deployment for stack:

import * as cdk from 'aws-cdk-lib';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';

export class CdktestStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const api = new apigateway.RestApi(this, 'cdk-bug-api', {
      restApiName: "cdk-bug-api",
      description: "This is my sample API for demonstration",
      deployOptions: {
        //cachingEnabled: true,
        //cacheClusterEnabled: true,
        //cacheTtl: cdk.Duration.minutes(60),
        throttlingBurstLimit: 5,
        throttlingRateLimit: 10
      }
    });

    api.root.addResource("hello").addMethod("GET");
    new cdk.CfnOutput(this, "ApiGatewayId", {
      value: api.restApiId
    });
  }
}

Below are some observations:

  • After CDK 1st deployment to enable cache, the Cache cluster status in AWS API Gateway console was still Create in progress for a while. Thereafter, it changed to Provisioned. AWS CLI command aws apigateway get-stage --rest-api-id <<API-ID>> --stage-name prod --query 'methodSettings.*.cachingEnabled' --output text returned True.
  • After initiating CDK 2nd deployment to disabled cache, the Cache cluster status in AWS API Gateway console was still Provisioned, even after waiting for a while.
  • Setting cachingEnabled and cacheClusterEnabled explicitly to false, changed Cache cluster status to Delete in progress and Default method-level caching to Inactive (as reported by you).

From CDK perspective, it generated the correct CloudFormation template and submitted the ChangeSet to CloudFormation. From there on, CDK is out of picture. The changes are deployed by CloudFormation to AWS API Gateway service. This appears to be a CloudFormation limitation; please open a new issue at https://github.com/aws-cloudformation/cloudformation-coverage-roadmap mentioning all the details.

Thanks,
Ashish

@ashishdhingra ashishdhingra added needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Dec 2, 2024
@ashishdhingra ashishdhingra removed their assignment Dec 2, 2024
@ashishdhingra ashishdhingra added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 2, 2024
Copy link

github-actions bot commented Dec 4, 2024

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Dec 4, 2024
@github-actions github-actions bot closed this as completed Dec 9, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
@aws-cdk/aws-apigateway Related to Amazon API Gateway bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

2 participants