Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

APM agent fails platform detection on cgroupv2 based systems #1691

Open
Eilyre opened this issue Nov 11, 2022 · 3 comments
Open

APM agent fails platform detection on cgroupv2 based systems #1691

Eilyre opened this issue Nov 11, 2022 · 3 comments
Labels
8.8-candidate agent-python community Issues opened by the community triage Issues awaiting triage

Comments

@Eilyre
Copy link

Eilyre commented Nov 11, 2022

Hello!

Widespread cgroupv2 adoption is around the corner, as many popular distributions already come with cgroupv2 enabled by default.

The problem currently is with the fact, that the APM agent cannot detect that it is running on cgroupv2 based system, as it tries to parse the cgroup file which is fairly empty on a cgroupv2 enabled system. This causes the agent to specify "linux" as it's platform, omit container/pod ID's and cause multiple integrations to fail inside the Elastic ecosystem.

The current cgroup implementation uses the /proc/self/cgroup file to obtain relevant container and pod IDs. This file is empty on cgroupv2 based systems, and even in so called cgroupv1 systems, it's an undocumented "feature".

Currently there is no agreed upon method for obtaining this information from inside the container, and it's still a standing issue for the open container spec developers: opencontainers/runtime-spec#1105

My proposal is to use the workaround similar to here:

The /proc/self/mountinfo file still contains references to necessary information (pod and container uid).

We could use this until at least the standard is agreed upon, and could be switched out.

An example of the relevant information from my latest stable Flatcar Linux system, with Containerd, Kubernetes 1.24.7 and cgroupv2 enabled:

5051 5044 259:8 /lib/kubelet/pods/68ee930a-2bd8-447b-8deb-426add7a2d09/etc-hosts /etc/hosts rw,relatime - xfs /dev/nvme0n1p3 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
5052 5046 259:8 /lib/kubelet/pods/68ee930a-2bd8-447b-8deb-426add7a2d09/containers/<pod_name>/2631d4a6 /dev/termination-log rw,relatime - xfs /dev/nvme0n1p3 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
5053 5044 259:8 /lib/containerd/io.containerd.grpc.v1.cri/sandboxes/199dafcfc5712cd1e9e49e94642e7df6cdf63356bbc3601e9115f26fd0d096e1/hostname /etc/hostname rw,relatime - xfs /dev/nvme0n1p3 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
5054 5044 259:8 /lib/containerd/io.containerd.grpc.v1.cri/sandboxes/199dafcfc5712cd1e9e49e94642e7df6cdf63356bbc3601e9115f26fd0d096e1/resolv.conf /etc/resolv.conf rw,relatime - xfs /dev/nvme0n1p3 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota

The problem which prevents me from working on this issue is that I do not know which formats these lines can take on different systems.

What are your ideas, would this method work?

@github-actions github-actions bot added agent-python community Issues opened by the community triage Issues awaiting triage labels Nov 11, 2022
@Eilyre
Copy link
Author

Eilyre commented Nov 11, 2022

elastic/apm#523 tracks this as well, but I'm not hopeful for a central solution from there, as this proposal can be categorized as a "hack".

@Eilyre Eilyre closed this as not planned Won't fix, can't repro, duplicate, stale Nov 11, 2022
@Eilyre Eilyre reopened this Nov 11, 2022
@andandrej
Copy link

Any updates on this?

@basepi
Copy link
Contributor

basepi commented Jan 3, 2023

@andandrej Unfortunately we haven't had a chance to address this yet.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
8.8-candidate agent-python community Issues opened by the community triage Issues awaiting triage
Projects
None yet
Development

No branches or pull requests

4 participants