Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

pod/convoy-agent in CrashLoopBackOff without any logs #2251

Open
kardeepakcars24 opened this issue Feb 28, 2025 · 6 comments
Open

pod/convoy-agent in CrashLoopBackOff without any logs #2251

kardeepakcars24 opened this issue Feb 28, 2025 · 6 comments

Comments

@kardeepakcars24
Copy link

kardeepakcars24 commented Feb 28, 2025

I have installed convoy in EKS using the helm chart pointing to my aws rds and elasticache redis. The init container has run successfully:

  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
...
  Normal   Created    12m                   kubelet            Created container wait-for-migrate
  Normal   Started    12m                   kubelet            Started container wait-for-migrate
  Normal   Created    11m (x4 over 12m)     kubelet            Created container agent
  Normal   Started    11m (x4 over 12m)     kubelet            Started container agent
...
  Warning  BackOff    2m27s (x48 over 12m)  kubelet            Back-off restarting failed container agent in pod convoy-agent-85996fbfbb-lkjwd_dev-convoy-namespace

This is what kubectl logs show:

Defaulted container "agent" out of: agent, wait-for-migrate (init)
Error: i/o timeout
time="2025-02-28T12:14:56Z" level=fatal msg="i/o timeout"
Usage:
  Convoy agent [flags]

Flags:
      --agent-port uint32      Agent port
...
      --tracer-type string                    Tracer backend, e.g. sentry, datadog or otel

wait-for-migrate container shows no logs

This is my helm chart values.yaml:

  convoy:
    # -- Docker image tags for all convoy components
    image: &image "getconvoy/convoy"
    # -- Docker image tags for all convoy components
    tag: &tag "v24.8.2"
    # -- Logger Level for all convoy components
    log_level: &logLevel "info"
    # -- Convoy Environment
    environment: &environment "oss"
    # -- Tracing config for all convoy services
    tracer_enabled: &tracerEnabled false
    # -- Tracing provider type
    tracer_type: &tracerType "otel"
    # -- Open Telemetry auth header name
    otel_auth_header_name: &otelHeaderName ""
    # -- Open Telemetry auth header value
    otel_auth_header_value: &otelHeaderValue ""
    # -- Open Telemetry sample rate
    otel_sample_rate: &otelSampleRate 1
    # -- Open Telemetry collector url
    otel_collector_url: &otelCollectorUrl ""
    # -- Open Telemetry insecure skip verify
    otel_insecure_skip_verify: &otelInsecureSkipVerify true
    # -- Sentry DSN
    sentry_dsn: &sentryDsn ""
    # -- Retention policy duration
    retention_policy_duration: &retentionPolicyDuration 720h
    # -- Retention policy enabled
    retention_policy_enabled: &retentionPolicyEnabled false
    # -- Enable usage analytics
    enable_usage_analytics: &enabledUsageAnalytics true
    # -- API version
    api_version: &apiVersion "2024-01-01"
    # -- License Key
    license_key: &licenseKey ""

  externalDatabase:
    # -- Enable an external database, This will use postgresql chart, Change values if you use an external database
    enabled: true
    # -- Host for the external database
    host: "my-dev-rds.adsfdsafdsaf.ap-south-1.rds.amazonaws.com"
    # -- Password for the external database
    postgresPassword: &postgresPassword mypostgresPasswordHere
    # -- Database name for the external database
    database: &postgresDatabase convoy
    # -- Password for the external database, ignored in case of secret parameter with non-empty value
    password: &userPassword mypostgresPasswordHere
    # -- If this secret parameter is not empty, password value will be ignored. The password in the secret should be in the 'password' key
    secret: ""
    # -- Username for the external database
    username: &username cfcommon
    # -- Query params for the external database
    options: "sslmode=require&connect_timeout=30"
    # -- Port for the external database
    port: 5432

  nativeRedis:
    # -- Enable redis, This will use redis chart, Disable if you use an external redis
    enabled: &redisEnabled false
    # -- Host for the redis
    host: ""
    # -- password for the redis, ignored in case of secret parameter with non-empty value
    password: &redisPassword "convoy"
    # -- If this secret parameter is not empty, password value will be ignored. The password in the secret should be in the 'password' key
    secret: ""
    # -- Port for the redis
    port: 6379

  externalRedis:
    # -- Enable external redis, Enable this if you use an external redis and disable Native redis
    enabled: true
    # -- redis cluster addresses, if set the other values won't be used
    addresses: "redis://master.my-dev-redis.adsfdsaf.aps1.cache.amazonaws.com:6379"
    # -- Host for the external redis
    host: "master.my-dev-redis.adsfdsaf.aps1.cache.amazonaws.com"
    # -- Scheme for the external redis. This can be redis, rediss, redis-socket or redis-sentinel
    scheme: "redis"
    # -- username for the external redis.
    username: ""
    # -- password for the external redis, ignored in case of secret parameter with non-empty value
    password: "whateverRandomAuthString"
    # -- If this secret parameter is not empty, password value will be ignored. The password in the secret should be in the 'password' key
    secret: ""
    # -- Database name for the external redis.
    database: "0"
    # -- Port for the external redis
    port: "6379"


# @ignored, used in case of external chart
postgresql:
  # -- Set to false if you don't want to create a postgres instance
  enabled: false
  fullnameOverride: "postgresql"
  global:
    postgresql:
      auth:
        postgresPassword: *postgresPassword
        username: *username
        password: *userPassword
        database: *postgresDatabase


# @ignored, used in case of external chart
redis:
  # -- Set to false if you don't want to create a redis instance
  enabled: false
  architecture: standalone
  fullnameOverride: "redis"
  auth:
    enabled: *redisEnabled
    password: *redisPassword

agent:
  image:
    # -- Repository to be used by the agent. The latest tag is used by default
    repository: *image
    # -- Pull policy for the agent image
    pullPolicy: IfNotPresent
    # @ignored
    tag: *tag

  env:
    environment: *environment
    proxy: ""
    sign_up_enabled: false
    log_level: *logLevel
    smtp:
      enabled: false
      from: ""
      # -- Ignored in case of secret parameter with non-empty value
      password: ""
      # -- If this secret parameter is not empty, password value will be ignored. The password in the secret should be in the 'password' key
      secret: ""
      port: 0
      provider: ""
      url: ""
      username: ""
      ssl: false
      reply_to: ""
    # @ignored
    tracer:
      type: *tracerType
      enabled: *tracerEnabled
      otel:
        otel_auth:
          header_name: *otelHeaderName
          header_value: *otelHeaderValue
        sample_rate: *otelSampleRate
        collector_url: *otelCollectorUrl
        insecure_skip_verify: *otelInsecureSkipVerify
      sentry:
        dsn: *sentryDsn
    pyroscope:
      enabled: false
      url: ""
      username: ""
      password: ""
      profile_id: ""
    enable_feature_flag: []
    retention_policy:
      policy: *retentionPolicyDuration
      enabled: *retentionPolicyEnabled
    analytics_enabled: *enabledUsageAnalytics
    storage:
      enabled: false
      type: ""
      on_prem:
        path: ""
      s3:
        bucket: ""
        accessKey: ""
        # -- Ignored in case of secret parameter with non-empty value
        secretKey: ""
        # -- If this secret parameter is not empty, secretKey value will be ignored. The password in the secret should be in the 'secretKey' key
        secret: ""
        region: ""
        session_token: ""
        endpoint: ""
    consumer_pool_size: 100
    enable_profiling: false
    metrics:
      enabled: false
      metrics_backend: prometheus
      prometheus_metrics:
        sample_time: 5
    instance_ingest_rate: 100
    worker_execution_mode: default
    max_retry_seconds: 7200
    license_key: *licenseKey
    dispatcher:
      insecure_skip_verify: false
      allow_list: ["0.0.0.0/0"]
      deny_list: ["127.0.0.1/8", "169.254.169.254/32"]
  app:
    replicaCount: 1
    resources: {}
    # limits:
    #   cpu: 1000m
    #   memory: 2000Mi
    # requests:
    #   cpu: 1000m
    #   memory: 1000Mi

  service:
    # -- Type of service for the agent
    type: ClusterIP
    # -- Port for the agent service
    port: 80

  autoscaling:
    # -- Enable autoscaling for the agent
    enabled: false
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 80
    targetMemoryUtilizationPercentage: 80

  podDisruptionBudget: { }
    # -- Pod disruption budget
#    maxUnavailable: 1
#    minAvailable: 1

server:
  image:
    # -- Repository to be used by the server. The latest tag is used by default
    repository: *image
    # -- Pull policy for the server image
    pullPolicy: IfNotPresent
    # @ignored
    tag: *tag

  env:
    environment: *environment
    log_level: *logLevel
    host: ""
    sign_up_enabled: false
    # -- Max response body when ingesting webhooks (might be renamed). Defaults to 50KB
    max_response_size: 50
    auth:
      jwt:
        enabled: true
      native:
        enabled: true
    # @ignored
    tracer:
      type: *tracerType
      enabled: *tracerEnabled
      otel:
        otel_auth:
          header_name: *otelHeaderName
          header_value: *otelHeaderValue
        sample_rate: *otelSampleRate
        collector_url: *otelCollectorUrl
        insecure_skip_verify: *otelInsecureSkipVerify
      sentry:
        dsn: *sentryDsn
    pyroscope:
      enabled: false
      url: ""
      username: ""
      password: ""
      profile_id: ""
    enable_feature_flag: []
    retention_policy:
      policy: *retentionPolicyDuration
      enabled: *retentionPolicyEnabled
    analytics_enabled: *enabledUsageAnalytics
    storage:
      enabled: false
      type: ""
      on_prem:
        path: ""
      s3:
        bucket: ""
        accessKey: ""
        # -- Ignored in case of secret parameter with non-empty value
        secretKey: ""
        # -- If this secret parameter is not empty, secretKey value will be ignored. The password in the secret should be in the 'secretKey' key
        secret: ""
        prefix: ""
        region: ""
        session_token: ""
        endpoint: ""
    api_version: *apiVersion
    analytics:
      enabled: true
    enable_profiling: false
    metrics:
      enabled: false
      metrics_backend: prometheus
      prometheus_metrics:
        sample_time: 5
    instance_ingest_rate: 100
    max_retry_seconds: 7200
    license_key: *licenseKey
  app:
    replicaCount: 1
    resources: {}
    # limits:
    #   cpu: 1000m
    #   memory: 2000Mi
    # requests:
    #   cpu: 1000m
    #   memory: 1000Mi

  ingress:
    # -- Enable ingress for the server
    enabled: true
    annotations:
      alb.ingress.kubernetes.io/certificate-arn: "arn:aws:acm:ap-south-1:myAccountId:certificate/fdadad-adsfdasf-adsfdafs-adsfdaf"
      alb.ingress.kubernetes.io/scheme: "internet-facing"
      alb.ingress.kubernetes.io/ip-address-type: "dualstack"
      alb.ingress.kubernetes.io/target-type: "ip"
    ingressClassName: "alb"
    tls:
      - hosts:
          - "dev-convoy.mydns.in"
    hosts:
      - host: "dev-convoy.mydns.in"
        http:
          paths:
            - path: /
              pathType: Prefix

  service:
    # -- Type of service for the server
    type: ClusterIP
    # -- Port for the server service
    port: 80

  autoscaling:
    # -- Enable autoscaling for the server
    enabled: false
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 80
    targetMemoryUtilizationPercentage: 80

  podDisruptionBudget: { }
    # -- Pod disruption budget
#    maxUnavailable: 1
#    minAvailable: 1
Copy link

linear bot commented Feb 28, 2025

@jirevwe
Copy link
Collaborator

jirevwe commented Feb 28, 2025

Hey @kardeepakcars24,

This seems like a connection error as I see an Error: i/o timeout log line. Can the pods reach both the RDS and Redis instances?

Can you try to run Convoy without the external DB and Redis so it spins up Postgres and Redis pods in the cluster?

@kardeepakcars24
Copy link
Author

Thanks @jirevwe for the response.

Yes, the pods can reach the RDS and Redis. Earlier there was an Auth issue, which threw an error in init container. There are many services setup in the cluster connecting to this RDS and redis. I believe the init container should crash if there is a connection issue, isn't it so?

@jirevwe
Copy link
Collaborator

jirevwe commented Feb 28, 2025

Based on the logs shared, the agent command failed. Can you show me the output of

kubectl pods

@kardeepakcars24
Copy link
Author

@jirevwe Did you perhaps mean this?

ec2-user@ip___:~/helm$ kubectl get pods -n dev-convoy-namespace
NAME                            READY   STATUS             RESTARTS      AGE
convoy-agent-7bfd4bb69d-mjrvf   0/1     CrashLoopBackOff   4 (23s ago)   2m18s
convoy-server-8667bf6cb-g7pbl   0/1     CrashLoopBackOff   4 (11s ago)   2m18s

@kardeepakcars24
Copy link
Author

Thanks @jirevwe for connecting on call. Setting scheme: "redis" to scheme: "rediss" fixed the issue. Might I propose that some logging about this part on connecting to redis could help.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants