Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Added CGroupv2 support into Docker Extensions #839

Merged
merged 18 commits into from
Jan 9, 2023

Conversation

anoopbabu29
Copy link
Contributor

@anoopbabu29 anoopbabu29 commented Dec 16, 2022

Which problem is this PR solving?

Adding a detector for the container id for cgroupv2 described in open-telemetry/opentelemetry-js-contrib#1173.

Short description of the changes

Generalized detector for docker v1, v2 and container id. Based on open-telemetry/opentelemetry-js-contrib#1181.

@anoopbabu29 anoopbabu29 requested a review from a team December 16, 2022 20:52
@utpilla utpilla added the comp:resources.container Things related to OpenTelemetry.Resources.Container label Dec 19, 2022
{
try
{
return this.ExtractContainerIdV1(path) ?? this.ExtractContainerIdV2(path);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just thinking out loud.
If the cgroup path will match the Id V2 (e.g. "13:name=systemd:/pod/d86d75589bf6cc254f3e2cc29debdf85dde404998aa128997a819ff991827356/hostname") the only way ExtractContainerIdV1() won't return you "hostname" is (I assume, please correct me if I'm wrong) because of EncodingUtils.IsValidHexString() that won't treat this string valid.
So my thoughts:

  1. it's a bit too advanced in the V1 logic to finally decide that this is the wrong string (I mean it goes thru a lot of code before realizing this)
  2. it's a bit fragile and is very easy to make a bug because from a IsValidHexString() context it's not clear how important this code for this place here. I know it's covered by unit tests but still

So maybe since the V2 logic puts a very specific requirement on the format of the string we should run it first? It will early out much faster than V1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that logic. After reviewing it, I believe that the IsValidHexString() is the method that prevents the hostname from being accepted and switching the two would be helpful since, as you mentioned, it has a stricter requirement. One way to make IsValidHexString() less fragile is to perhaps make a more strict requirement like a character limit (64 characters)? But it is also unclear to me how crucial this method is so I definitely do not think that I should change it here.

@utpilla
Copy link
Contributor

utpilla commented Dec 19, 2022

Adding the component owner for this project. @iskiselev Could you please review this PR?

@@ -28,6 +28,8 @@ namespace OpenTelemetry.Extensions.Docker.Resources;
public class DockerResourceDetector : IResourceDetector
{
private const string FILEPATH = "/proc/self/cgroup";
private const string FILEPATHV2 = "/proc/self/mountinfo";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any usage of FILEPATHV2 in detector. I'd guess, that ExtractContainerIdV2 should use it, instead of FILEPATH. Proper fix for it will also solve problem mentioned by @alexeypukhov , as that functions should check different file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change has been made to check both versions in the Detect method and assuming the version being used in the BuildResource method. This way, the problem mentioned by @alexeypukhov would be solved as you mentioned since no container id would be able to be found when checking for cgroupv1 if cgroupv2 is being used.

/// <returns>Returns Resource with list of key-value pairs of container resource attributes if container id exists else empty resource.</returns>
internal Resource BuildResource(string path)
internal Resource BuildResource(string path, bool isCGroupV1 = true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you already have two sets of methods: one for V1 and another for V2. I'd keep the same for this method too. That way you don't have to carry isCGroupV1 parameter from method to method and you can call either BuildResourceV1() or BuildResourceV2() method in the Detect()

using (TempFile tempFile = new TempFile())
{
tempFile.Write(CGROUPLINEV2);
Assert.Equal(CONTAINERIDV2, this.GetContainerId(dockerResourceDetector.BuildResource(tempFile.FilePath, false)));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use Detect() method for unit test instead of BuildResource(). Detect is part of the interface while BuildResource is just a part of internal implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now there is currently no reference to Detect in the tests, and I believe its due to considerations of writing to a specific file path on the machine. As such, the only aspect of the test that would be lost by not including this detect is the reading of the file at the designated filepath. I'm not sure if its a good idea to include Detect in the tests if this is the case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. If we don't want to stick to the FILEPATH and FILEPATH2 locations you cannot use Detect() method. In that case I think you made the right choice to test BuildResourceV1 and BuildResourceV2 separately. At the end if you know which version you are writing to a temp file you should know which method to call to read from the file

@@ -57,6 +57,13 @@ public class DockerResourceDetectorTests
private const string CONTAINERID =
"d86d75589bf6cc254f3e2cc29debdf85dde404998aa128997a819ff991827356";

// cgroupv2 line with container Id
private const string CGROUPLINEV2 =
Copy link

@alexeypukhov alexeypukhov Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you also need to check suffix or prefix for cgroup v2?
Also just a thought. Should we check prefix and suffix end to end, I mean with reading from file? If GetContainerId works with extracted cgroup string no matter V1 or V2, maybe we should have a separate test just for this function and do not mix it with reading the line from the file and parsing it depending on V1 or V2

Copy link
Contributor Author

@anoopbabu29 anoopbabu29 Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can check for them as they are separate implementations. I figured that most of the functionality was similar enough to warrant not having them but I believe you are correct that they should exist, although the suffix isn't a necessary test in my opinion since its not a real case I believe.

@codecov
Copy link

codecov bot commented Dec 23, 2022

Codecov Report

Merging #839 (d043c33) into main (70f9f0f) will increase coverage by 0.00%.
The diff coverage is 80.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #839   +/-   ##
=======================================
  Coverage   79.07%   79.07%           
=======================================
  Files         182      182           
  Lines        5614     5629   +15     
=======================================
+ Hits         4439     4451   +12     
- Misses       1175     1178    +3     
Impacted Files Coverage Δ
...r.Stackdriver/Implementation/ActivityExtensions.cs 75.00% <ø> (ø)
...ons/Internal/ActivityEventAttachingLogProcessor.cs 93.33% <0.00%> (ø)
...on.EventCounters/MeterProviderBuilderExtensions.cs 100.00% <ø> (ø)
...elemetry.Instrumentation.Process/ProcessMetrics.cs 100.00% <ø> (ø)
...ensions.Docker/Resources/DockerResourceDetector.cs 82.60% <77.77%> (-1.27%) ⬇️
...ry.Extensions/Internal/DefaultLogStateConverter.cs 87.50% <100.00%> (ø)
...nsions/Logs/LogToActivityEventConversionOptions.cs 100.00% <100.00%> (ø)
...try.Extensions/Trace/AutoFlushActivityProcessor.cs 81.48% <100.00%> (ø)

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 31, 2022
@Kielek
Copy link
Contributor

Kielek commented Jan 2, 2023

@iskiselev based on the PR description package will support not only Docker, but also everything based on container id.
I suppose the package name should be changed from OpenTelemetry.Extensions.Docker to OpenTelemetry.Extensions.Container. Also all public API should be adjusted. I would keep it as separate PR.

BTW PR was marked as Stale, please let us know if you want to proceed with it.

@pellared
Copy link
Member

pellared commented Jan 2, 2023

I suggest to take a look at open-telemetry/opentelemetry-go#3508 for reference (especially the comments and code connected with testing)

@github-actions github-actions bot removed the Stale label Jan 3, 2023
using (TempFile tempFile = new TempFile())
{
tempFile.Write(INVALIDHEXCGROUPLINEV2);
Assert.Equal(dockerResourceDetector.BuildResourceV1(tempFile.FilePath), Resource.Empty);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it call BuildResourceV2, same as on line 202?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or probably both. In that case we may need to add this tests to V1 also - as incorrect V2 lines should be still incorrect for V1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my mistake, I'll make that change. And yes, I'll make a test checking that V1 fails for V2 and vice versa.

/// </summary>
/// <param name="path">File path where container id exists.</param>
/// <returns>Returns Resource with list of key-value pairs of container resource attributes if container id exists else empty resource.</returns>
internal Resource BuildResourceV1(string path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source for BuildResourceV1 / BuildResourceV2 are very similar, same as ExtractContainerIdV1 / ExtractContainerIdV2.
To have less duplicated code, I'd probably create enum ParseMode {V1, V2} and pass it as second argument to BuildResource / ExtractContainerId with different logic called when you fetch string containerId = ... in ExtractContainerId.

private const string CONTAINERIDV2WITHPODMAN = "1a2de27e7157106568f7e081e42a8c14858c02bd9df30d6e352b298178b46809";

// Invalid cgroup line (contains a z)"
private const string INVALIDHEXCGROUPLINEV2 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have so many const defined in that file now. I like @pellared suggestion to look into structure of tests in open-telemetry/opentelemetry-go#3508 - they looks nice with a dictionary of named cased with input / expected result pairs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also turn it into a theory with InlineData if that works better in this context

@iskiselev
Copy link
Contributor

@iskiselev based on the PR description package will support not only Docker, but also everything based on container id. I suppose the package name should be changed from OpenTelemetry.Extensions.Docker to OpenTelemetry.Extensions.Container. Also all public API should be adjusted. I would keep it as separate PR.

@Kielek, the name can be changed, as more generic name OpenTelemetry.Extensions.Container - but we already have first release of it at https://www.nuget.org/packages/OpenTelemetry.Extensions.Docker - we will need to deprecate old package in NuGet and add link to new package name there. I don't know how to do it, but hope it won't be too hard. We can create separate issue and discuss it there. Looks like nodejs have already renamed it to container.

BTW PR was marked as Stale, please let us know if you want to proceed with it.
I've added review comments - was on holiday break before. Hope we should be able to merge it soon.

@Kielek
Copy link
Contributor

Kielek commented Jan 5, 2023

@iskiselev based on the PR description package will support not only Docker, but also everything based on container id. I suppose the package name should be changed from OpenTelemetry.Extensions.Docker to OpenTelemetry.Extensions.Container. Also all public API should be adjusted. I would keep it as separate PR.

@Kielek, the name can be changed, as more generic name OpenTelemetry.Extensions.Container - but we already have first release of it at https://www.nuget.org/packages/OpenTelemetry.Extensions.Docker - we will need to deprecate old package in NuGet and add link to new package name there. I don't know how to do it, but hope it won't be too hard. We can create separate issue and discuss it there. Looks like nodejs have already renamed it to container.

Created #869. Most of the packages was previously released as OpenTelemetry.Contrib.Something.Something2, then renames to OpenTelemetry.Something.Something2. I think that @utpilla know how to handle such scenario.

{
Name = "Invalid hex cgroupv2 line (contains a z)",
Line = "13:name=systemd:/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/fb5916a02feca96bdeecd8e062df9e5e51d6617c8214b5e1f3fz9320f4402ae6/hostname",
CgroupVersion = DockerResourceDetector.ParseMode.V1,
Copy link
Contributor

@iskiselev iskiselev Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it test for DockerResourceDetector.ParseMode.V2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me fix that.

Copy link
Contributor

@iskiselev iskiselev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good now.

@iskiselev
Copy link
Contributor

@Kielek , @utpilla - the tests that failed looks not connected with this change:
OpenTelemetry.Instrumentation.EventCounters.Tests.EventCountersMetricsTests.EventCounterSameNameUsesNewestCreated
OpenTelemetry.Instrumentation.EventCounters.Tests.EventCountersMetricsTests.EventSourceNameShortening

Will somebody be able to restart tests (in case if it will pass on second run) or advice what can be done with them?

@Kielek
Copy link
Contributor

Kielek commented Jan 9, 2023

Unfortunately, code coverego pipeline is unstable. Please do not bother if all Docker related tests passed. It is not flagged as mandatory, so @utpilla can merge after review.

@utpilla utpilla merged commit 7bd2fc9 into open-telemetry:main Jan 9, 2023
mmanciop pushed a commit to lumigo-io/opentelemetry-dotnet-contrib that referenced this pull request Jan 11, 2023
iskiselev added a commit to iskiselev/opentelemetry-dotnet-contrib that referenced this pull request Jan 12, 2023
* Changes
Update CHANGELOG for 1.1.0-beta.1 release, that will include CGroup V2 support (open-telemetry#839, open-telemetry#693). It still have `-beta.1` suffix to make it prerelease until decision on package name will be done (open-telemetry#881, open-telemetry#869)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
comp:resources.container Things related to OpenTelemetry.Resources.Container
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants