Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

No container id found by container id providers after AKS upgrade 1.23->1.25 #344

Closed
TTmaister opened this issue Mar 23, 2023 · 15 comments
Closed
Assignees
Labels
bug status:resolved Bug is fixed but not released yet. troubleshoot
Milestone

Comments

@TTmaister
Copy link
Contributor

After upgrade AKS 1.23.x to 1.25.x Application Insight Kubernetes addon can't get container Id if multiple container are used.

RBAC roles and binding has been same before upgrade.

ContainerIdProviders not

[Warning] [2023-03-23T14:50:26.2712637Z] [DockerEngineMountInfoContainerIdProvider] Can't figure out container id.
[Warning] [2023-03-23T14:50:26.2712637Z] [DockerEngineMountInfoContainerIdProvider] Can't figure out container id.
[Information] [2023-03-23T14:50:26.2712968Z] No container id found by container id providers.
[Information] [2023-03-23T14:50:26.2712968Z] No container id found by container id providers.

cat /proc/self/mountinfo

5504 4910 0:571 / / rw,relatime master:1369 - overlay overlay rw,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/870/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/727/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/722/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/721/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/720/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/719/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/642/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/871/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/871/work
5505 5504 0:573 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
5506 5504 0:574 / /dev rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64
5507 5506 0:575 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666
5508 5506 0:431 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw
5509 5504 0:436 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs ro
5510 5509 0:28 / /sys/fs/cgroup ro,nosuid,nodev,noexec,relatime - cgroup2 cgroup rw
5511 5506 0:428 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,size=65536k,inode64
5512 5504 8:1 /var/lib/kubelet/pods/39f4203d-ace9-4f64-954f-c30dfb6f473a/etc-hosts /etc/hosts rw,relatime - ext4 /dev/root rw,discard,errors=remount-ro
5513 5506 8:1 /var/lib/kubelet/pods/39f4203d-ace9-4f64-954f-c30dfb6f473a/containers/managementgateway/acde57d1 /dev/termination-log rw,relatime - ext4 /dev/root rw,discard,errors=remount-ro
5514 5504 8:1 /var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/6f3e199feb9dbf9b7d1b05199f9977876d4159b05761e6533324c2abcb1a67cf/hostname /etc/hostname rw,relatime - ext4 /dev/root rw,discard,errors=remount-ro
5515 5504 8:1 /var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/6f3e199feb9dbf9b7d1b05199f9977876d4159b05761e6533324c2abcb1a67cf/resolv.conf /etc/resolv.conf rw,relatime - ext4 /dev/root rw,discard,errors=remount-ro
5516 5504 0:395 / /run/secrets/azure/tokens ro,relatime - tmpfs tmpfs rw,size=5486876k,inode64
5517 5504 0:427 / /run/secrets/kubernetes.io/serviceaccount ro,relatime - tmpfs tmpfs rw,size=5486876k,inode64
4911 5505 0:573 /bus /proc/bus ro,nosuid,nodev,noexec,relatime - proc proc rw
4912 5505 0:573 /fs /proc/fs ro,nosuid,nodev,noexec,relatime - proc proc rw
4913 5505 0:573 /irq /proc/irq ro,nosuid,nodev,noexec,relatime - proc proc rw
4914 5505 0:573 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw
4915 5505 0:573 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw
4916 5505 0:576 / /proc/acpi ro,relatime - tmpfs tmpfs ro,inode64
4917 5505 0:574 /null /proc/kcore rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64
4918 5505 0:574 /null /proc/keys rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64
4919 5505 0:574 /null /proc/timer_list rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64
4920 5505 0:577 / /proc/scsi ro,relatime - tmpfs tmpfs ro,inode64
4921 5509 0:578 / /sys/firmware ro,relatime - tmpfs tmpfs ro,inode64

@xiaomi7732
Copy link
Member

xiaomi7732 commented Mar 23, 2023

Hi @TTmaister, this is a known issue because of the implementations of docker engines. There has been a related issue #290.

Please check out the design doc here for the container id providers: https://github.com/microsoft/ApplicationInsights-Kubernetes/wiki/Design-for-ContainerIdProviders

That said, 2 questions for you:

  1. Is the container id anywhere in the mountinfo you shared? If it is, we might be able to append another container id providers for your case.

  2. Is it possible for you to set up the environment variable for container id as mentioned in the design doc to workaround the issue? For example: ContainerId=my-container-id.

Any proposal for how to make this more adaptive is super-welcome!

BTW, just FYI, here's a lengthy stackoverflow discussion on the topic:
https://stackoverflow.com/questions/20995351/how-can-i-get-docker-linux-container-information-from-within-the-container-itsel/72565733#72565733

And an open standard issue:

opencontainers/runtime-spec#1105

@TTmaister
Copy link
Contributor Author

  1. Container ID was 34bc656c1739fe713a867e18cc307f96e6fcba3cf89755db31b3c837f9d137f5. The answer is that there is no container id visible in mountinfo.

  2. Possibly I can set the container id in the environment variables, but I don't know how to do it in the Kubernetes Deployment yaml so that each pod has its own ID. Does it have to be the same as what the Kubernetes Api provides.

@xiaomi7732
Copy link
Member

Hey @TTmaister, thanks for providing the information. Unfortunately, without container id anywhere in the container, we won't have the magic to somehow make it available.

Manually set the environment would probably be the only way that will provide full enhancement.
I don't know what will happen if you set container id to a mismatched value. That is an interesting idea. I assume you could set the environment variable to a random value for identification of the instances, but will it provide too much value? (pods comes and goes, and if the container id is random, why bother?)

That said, could you please help me understand what the telemetry looks like on your end? I assume container id missing but everything else, like podName, node info should be there on the events, does that align with what you see?

@xiaomi7732 xiaomi7732 self-assigned this Mar 24, 2023
@TTmaister
Copy link
Contributor Author

Application Insight is missing all the data brought by the plugin. I can see in Application Insight that the POD requests information about the POD from the Kubernetes API and gets the response http200.

@TTmaister
Copy link
Contributor Author

Hey @xiaomi7732

Does it make any sense to change the ContainerIdHolder class to be able to parse the ContainerName variable from the environment variables.

And then filtter ContainerStatus by ContainerName

 if (containerStatuses is not null && containerStatuses.Count > 1)
        {
            string? containerName = Environment.GetEnvironmentVariable("ContainerName");
            containerStatus = containerStatuses.FirstOrDefault(c => c.Name == containerName);
            _logger.LogInformation(FormattableString.Invariant($"Use the only container inside the pod for container id: {containerStatus.ContainerID}"));

            using (IServiceScope scope = _serviceScopeFactory.CreateScope())
            {
                IContainerIdNormalizer normalizer = scope.ServiceProvider.GetRequiredService<IContainerIdNormalizer>();
                if (normalizer.TryNormalize(containerStatus.ContainerID, out string? normalizedContainerId))
                {
                    _containerId = normalizedContainerId;
                    return true;
                }
            }

            _logger.LogError(FormattableString.Invariant($"Normalization failed for container id: {containerStatus.ContainerID}"));
        }

ContinerName can be defined as environment variable Kubernetes Deployment yaml.

env:
  - name: ContainerName
     valueFrom:
       fieldRef:
         fieldPath: metadata.labels['app']
         

Or some other label. We have set "app" label for all deployments.

@xiaomi7732
Copy link
Member

xiaomi7732 commented Mar 27, 2023

Hi @TTmaister, using ContainerName for matching seem like an approach. I think we should peruse it when time permits. Do you know is pod name always unique?

With regarding what is going on, I think I see it. Container id was supposed to be optional, but it is required - in between several iterations to support .NET 6 and the latest K8s SDK. See below for the details:

It was like this, there had been a note to allow empty container id.

// Notes: It is still possible for the optional container id to be empty at this point, the following method needs to handle the case.
if (!await SpinWaitContainerReadyAsync(timeoutAt, queryClient, myPod, containerId, cancellationToken).ConfigureAwait(false))
{
    _logger.LogError(Invariant($"Kubernetes info is not available before the timeout at {timeoutAt}."));
    return null;
}

And in the implementation, this is the logic to allow empty container id:

if (!string.IsNullOrEmpty(myContainerId))
{
    // Check targeted container status
    readyToGo = IsContainerReady(podInfo.GetContainerStatus(myContainerId));
}
else
{
    _logger.LogWarning("No container id available. Fallback to use the any container for status checking.");
    readyToGo = podInfo.GetAllContainerStatus().Any(s => IsContainerReady(s));
}

And it is like this now, meaning optional container id is not allowed:

public async Task<bool> IsContainerReadyAsync(CancellationToken cancellationToken)
{
    V1ContainerStatus? myContainerStatus = await GetMyContainerStatusAsync(cancellationToken).ConfigureAwait(false);
    if (myContainerStatus is not null)
    {
        return IsContainerStatusReady(myContainerStatus);
    }

    return false;
}

That doesn't agree with the design that I shared earlier and shall be treated as a bug. Will it unblock you if I make it optional again?

@TTmaister
Copy link
Contributor Author

Pod can be recreate with the same name, but there can only be one with the same name at a time. Object Names and IDs

If I tested it right... if Container Id missing then Application Insight not showing "Cloud Role Name".
But is show and populate this custom properties

Kubernetes.Node.Name
Kubernetes.Pod.Name
Kubernetes.Pod.Namespace
Kubernetes.Node.ID
Kubernetes.Deployment.Name
Kubernetes.Pod.ID
Kubernetes.ReplicaSet.Name
Kubernetes.Pod.Labels

@xiaomi7732
Copy link
Member

That is correct. FYI, the line of code:

.

@xiaomi7732
Copy link
Member

xiaomi7732 commented Mar 28, 2023

I think there could be 2 fixes:

  1. Make container id optional (PR Container id shall be optional #345 )
  2. Allows user the specify Container Name. (Investigation: Using a label to locate the current container #346)

And I inspected various options to back fill container id, and I think using container name is a very good idea. It might worth a bit documentation but could be useful as an alternative to container ids.

@TTmaister are you interested to submit a PR for it? Or I can prepare a PR if you don't have the time.

BTW, if you are interested, please fork this repository, and branching off your own fork since this repository has been locked down and nobody could PR into it directly. :-)

@xiaomi7732
Copy link
Member

@TTmaister your change has been released:

https://www.nuget.org/packages/Microsoft.ApplicationInsights.Kubernetes/6.1.1-beta2

Let me know if that unblocks you. Thank you for your contribution!

@TTmaister
Copy link
Contributor Author

@TTmaister your change has been released:

https://www.nuget.org/packages/Microsoft.ApplicationInsights.Kubernetes/6.1.1-beta2

Let me know if that unblocks you. Thank you for your contribution!

Works as expected.

  1. One container no need for environment variable to get "Cloud Role Name" visible.
  2. More than one container and ContainerName environment variable is not present "Cloud Role Name" not visible.
  3. Multiple containers and ContainerName environment variable is preset "Cloud Role Name" visible.

@xiaomi7732 Maybe this should be documented to project readme.

@xiaomi7732
Copy link
Member

xiaomi7732 commented Mar 31, 2023

@TTmaister, thanks for the verification! I will update the wiki. I'll keep this issue open until the stable version of 6.1.1 got released.

@xiaomi7732
Copy link
Member

@xiaomi7732 xiaomi7732 added status:resolved Bug is fixed but not released yet. and removed status:active Actively work on labels Apr 4, 2023
@Ismael-Pep
Copy link

Is there an Eta on when the version 6.1.1 will be released?, Thx!

@xiaomi7732
Copy link
Member

Hey @Ismael-Pep, thanks for the inquiry. Here's the new package is released today:
https://www.nuget.org/packages/Microsoft.ApplicationInsights.Kubernetes/6.1.1

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug status:resolved Bug is fixed but not released yet. troubleshoot
Projects
None yet
Development

No branches or pull requests

3 participants