strimzi-debugging/sessions/002 at main · fvaleri/strimzi-debugging

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

Get diagnostic data

First, use this session to deploy a Kafka cluster on Kubernetes.

When debugging issues, you usually need to retrieve various artifacts from the environment, which can be a lot of effort. Fortunately, Strimzi provides a must-gather script that can be used to download all relevant artifacts and logs from a specific Kafka cluster.

Note

You can add the --secrets=all option to also get secret values.

$ curl -s https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/tools/report.sh \
  | bash -s -- --namespace=test --cluster=my-cluster --out-dir=~/Downloads
deployments
    deployment.apps/my-cluster-entity-operator
statefulsets
replicasets
    replicaset.apps/my-cluster-entity-operator-bb7c65dd4
configmaps
    configmap/my-cluster-broker-5
    configmap/my-cluster-broker-6
    configmap/my-cluster-broker-5
    configmap/my-cluster-controller-0
    configmap/my-cluster-controller-1
    configmap/my-cluster-controller-2
    configmap/my-cluster-entity-topic-operator-config
    configmap/my-cluster-entity-user-operator-config
secrets
    secret/my-cluster-clients-ca
    secret/my-cluster-clients-ca-cert
    secret/my-cluster-cluster-ca
    secret/my-cluster-cluster-ca-cert
    secret/my-cluster-cluster-operator-certs
    secret/my-cluster-entity-topic-operator-certs
    secret/my-cluster-entity-user-operator-certs
    secret/my-cluster-kafka-brokers
services
    service/my-cluster-kafka-bootstrap
    service/my-cluster-kafka-brokers
poddisruptionbudgets
    poddisruptionbudget.policy/my-cluster-kafka
roles
    role.rbac.authorization.k8s.io/my-cluster-entity-operator
rolebindings
    rolebinding.rbac.authorization.k8s.io/my-cluster-entity-topic-operator-role
    rolebinding.rbac.authorization.k8s.io/my-cluster-entity-user-operator-role
networkpolicies
    networkpolicy.networking.k8s.io/my-cluster-entity-operator
    networkpolicy.networking.k8s.io/my-cluster-network-policy-kafka
pods
    pod/my-cluster-broker-5
    pod/my-cluster-broker-6
    pod/my-cluster-broker-5
    pod/my-cluster-controller-0
    pod/my-cluster-controller-1
    pod/my-cluster-controller-2
    pod/my-cluster-entity-operator-bb7c65dd4-9zdmk
persistentvolumeclaims
    persistentvolumeclaim/data-my-cluster-broker-5
    persistentvolumeclaim/data-my-cluster-broker-6
    persistentvolumeclaim/data-my-cluster-broker-5
    persistentvolumeclaim/data-my-cluster-controller-0
    persistentvolumeclaim/data-my-cluster-controller-1
    persistentvolumeclaim/data-my-cluster-controller-2
ingresses
routes
clusterroles
    clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-global
    clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-leader-election
    clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-namespaced
    clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-watched
    clusterrole.rbac.authorization.k8s.io/strimzi-entity-operator
    clusterrole.rbac.authorization.k8s.io/strimzi-kafka-broker
    clusterrole.rbac.authorization.k8s.io/strimzi-kafka-client
clusterrolebindings
    clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator
    clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-kafka-broker-delegation
    clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-kafka-client-delegation
clusteroperator
    deployment.apps/strimzi-cluster-operator
    replicaset.apps/strimzi-cluster-operator-6596f469c9
    pod/strimzi-cluster-operator-6596f469c9-smsw2
    configmap/strimzi-cluster-operator
draincleaner
customresources
    kafkanodepools.kafka.strimzi.io
        broker
        controller
    kafkas.kafka.strimzi.io
        my-cluster
    kafkatopics.kafka.strimzi.io
        my-topic
    strimzipodsets.core.strimzi.io
        my-cluster-broker
        my-cluster-controller
events
logs
    my-cluster-broker-5
    my-cluster-broker-6
    my-cluster-broker-5
    my-cluster-controller-0
    my-cluster-controller-1
    my-cluster-controller-2
    my-cluster-entity-operator-bb7c65dd4-9zdmk
Report file report-17-03-2025_12-26-05.zip created

Get heap dumps

It is also possible to collect broker JVM heap dumps and other advanced diagnostic data (thread dumps, flame graphs, etc).

Warning

Taking a heap dump is a heavy operation that can cause the Java application to hang. It is not recommended in production, unless it is not possible to reproduce the memory issue in a test environment.

Debugging locally can often be easier and faster. However, some issues only manifest in Kubernetes due to factors like networking, resource limits, or interactions with other components. Even if you try to match your local setup to the Kubernetes configuration, subtle differences (e.g. service discovery, security settings, or operator-managed logic) might lead to different behavior.

Create an additional volume of the desired size using a PVC.

$ echo -e "apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard" | kubectl create -f -
persistentvolumeclaim/my-pvc created

Mount the new volume using the additional volume feature within Kafka template (rolling update). It is required to use /mnt mount point.

Warning

Adding a custom volume triggers pod restarts, which can make it difficult to capture an issue that has already occurred. If the issue cannot be easily reproduced in a test environment, configuring the volume in advance could help avoid the pod restarts when you need them most.

$ kubectl patch k my-cluster --type merge -p '
    spec:
      kafka:
        template:
            pod:
              volumes:
                - name: my-volume
                  persistentVolumeClaim:
                    claimName: my-pvc
            kafkaContainer:
              volumeMounts:
                - name: my-volume
                  mountPath: "/mnt/data"'
kafka.kafka.strimzi.io/my-cluster patched

When the rolling update completes, create a broker heap dump and copy the output file to localhost.

$ PID="$(kubectl exec my-cluster-broker-5 -- jcmd | grep "kafka.Kafka" | awk '{print $1}')"

$ kubectl exec my-cluster-broker-5 -- jcmd "$PID" VM.flags
724:
-XX:CICompilerCount=4 -XX:ConcGCThreads=3 -XX:G1ConcRefinementThreads=10 -XX:G1EagerReclaimRemSetThreshold=32 -XX:G1HeapRegionSize=4194304
-XX:GCDrainStackTargetSize=64 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/data/oome.hprof -XX:InitialHeapSize=5368709120
-XX:+ManagementServer -XX:MarkStackSize=4194304 -XX:MaxHeapSize=5368709120 -XX:MaxNewSize=3221225472 -XX:MinHeapDeltaBytes=4194304
-XX:MinHeapSize=5368709120 -XX:NonNMethodCodeHeapSize=5839372 -XX:NonProfiledCodeHeapSize=122909434 -XX:ProfiledCodeHeapSize=122909434
-XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:SoftMaxHeapSize=5368709120 -XX:-THPStackMitigation
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC

$ kubectl exec my-cluster-broker-5 -- jcmd "$PID" GC.heap_dump /mnt/data/heap.hprof
724:
Dumping heap to /mnt/data/heap.hprof ...
Heap dump file created [179236580 bytes in 0.664 secs]

$ kubectl cp my-cluster-broker-5:/mnt/data/heap.hprof "$HOME"/Downloads/heap.hprof
tar: Removing leading `/' from member names

If the pod is crash looping, the dump can still be recovered by spinning up a temporary pod and mounting the volume.

$ kubectl run my-pod --restart "Never" --image "foo" --overrides "{
  \"spec\": {
    \"containers\": [
      {
        \"name\": \"busybox\",
        \"image\": \"busybox\",
        \"imagePullPolicy\": \"IfNotPresent\",
        \"command\": [\"/bin/sh\", \"-c\", \"trap : TERM INT; sleep infinity & wait\"],
        \"volumeMounts\": [
          {\"name\": \"data\", \"mountPath\": \"/mnt/data\"}
        ]
      }
    ],
    \"volumes\": [
      {\"name\": \"data\", \"persistentVolumeClaim\": {\"claimName\": \"my-pvc\"}}
    ]
  }
}"

$ kubectl exec my-pod -- ls -lh /mnt/data
total 171M   
-rw-------    1 1001     root      170.9M Mar 17 14:38 heap.hprof

For the heap dump analysis you can use a tool like Eclipse Memory Analyzer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

002

002

README.md

Get diagnostic data

Get heap dumps

Files

002

Directory actions

More options

Directory actions

More options

Latest commit

History

002

Folders and files

parent directory

README.md

Get diagnostic data

Get heap dumps