Skip to content

Protobuf marshalling error when processing traffic over gRPC stream size mismatch #3260

@inliquid

Description

@inliquid

What happened?

Issue affects our prod systems and constantly appears during load tests.

This was initially discovered when using own gRPC agent which consumes events from tetragon directly, but this could be easily reproduced using tetra.

In a container which is being monitored run:

while true; do cat /etc/pam.conf > /dev/null  && awk 'BEGIN {system("whoami")}' > /dev/null && sleep 0.25 || break; done

In tetragon container run:

tetra getevents --pods test-pod -o compact

This will fail after some time (~5-60 min) with following error:

<...>
🚀 process default/test-pod-debian /usr/bin/whoami
💥 exit    default/test-pod-debian /usr/bin/whoami  0
💥 exit    default/test-pod-debian /bin/sh -c whoami 0
💥 exit    default/test-pod-debian /usr/bin/awk  "BEGIN {system("whoami")}" 0
🚀 process default/test-pod-debian /usr/bin/sleep 0.25
time="2024-12-26T14:17:58Z" level=fatal msg="Failed to receive events" error="rpc error: code = Internal desc = grpc: error while marshaling: marshaling tetragon.GetEventsResponse: size mismatch (see https://github.com/golang/protobuf/issues/1609): calculated=0, measured=134"

This reproduces even without any Tracing Policy.

Tetragon Version

v1.1.2

Kernel Version

5.14.0-284.30.1.el9_2.x86_64

Kubernetes Version

v1.27.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions