Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Detect whether Delegate is available on both slices and scopes #1776

Merged
merged 1 commit into from
Apr 12, 2018

Conversation

filbranden
Copy link
Contributor

Starting with systemd 237, in preparation for cgroup v2, delegation is only now available for scopes, not slices.

Update libcontainer code to detect whether delegation is available on both and use that information when creating new slices.

Fixes kubernetes/kubernetes#61474.

/cc @adelton, @jmontleon, @derekwaynecarr.

@derekwaynecarr
Copy link
Contributor

FYI @mrunalp and @sjenning

Copy link
Contributor

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crosbymichael
Copy link
Member

You need to sign your commit with the DCO. Please look at the contributing docs for information on the process.

@adelton
Copy link

adelton commented Apr 9, 2018

I've copied libcontainer/ from this pull request to vendor/github.com/opencontainers/runc/libcontainer/ of Kubernetes master, recompiled and run ./hack/local-up-cluster.sh but I did not see any improvement -- the 127.0.0.1 node still remains NotReady. This is with systemd-238-7.fc29.x86_64.

@filbranden
Copy link
Contributor Author

@crosbymichael Will do, I'll update the commit description and force-push it here.

@adelton The problem you reported in kubernetes/kubernetes#61474 is:

E0321 09:23:23.527391   27717 node_container_manager.go:52] Failed to create "/kubepods" cgroup
F0321 09:23:23.527441   27717 kubelet.go:1363] Failed to start ContainerManager Delegation not available for unit type

Which this commit should fix.

The problem you reported in the comment above (kubelet startup hanging) is actually fixed by #1772.

So I think you need both :-)

While #1772 by fixes the problem of kubelet hanging on systemd v236 and below, that commit by itself won't fix the problem you're seeing starting kubelet on systemd v238, then this commit will fix that too. Can you try with both of them cherry-picked?

Thanks!
Filipe

@adelton
Copy link

adelton commented Apr 10, 2018

Thank you. Now I took #1776 and cherry-picked 8ab251f from #1772 on top of that, and copied libcontainer/* content to Kubernetes' vendor/github.com/opencontainers/runc/libcontainer/. I still see the

E0410 02:40:53.328962   15638 node_container_manager.go:53] Failed to create "/kubepods" cgroup
F0410 02:40:53.329024   15638 kubelet.go:1302] Failed to start ContainerManager Delegation not available for unit type

message and node not getting ready.

@filbranden
Copy link
Contributor Author

Thanks for testing @adelton

I'm working with the hypothesis that systemd v238 will not return an error that Delegate is read-only (which is what the code is checking) but that the attribute does not exist (which is effectively the change that went in on v237.)

Will perform some additional tests using busctl and update this code to do that instead. (In fact, I think I used to have a "go run" test case I could use on v238 too, let me dig that up.)

I'll let you know when I have a new version you can test.

Cheers,
Filipe

Starting with systemd 237, in preparation for cgroup v2, delegation is
only now available for scopes, not slices.

Update libcontainer code to detect whether delegation is available on
both and use that information when creating new slices.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
@filbranden
Copy link
Contributor Author

Yes that was it... I just added "org.freedesktop.DBus.Error.InvalidArgs" to the list of errors to check and now my mini test case of creating a single slice works...

@adelton Can you please test this again in your environment?

Note that #1772 is now merged and I rebased this branch on top of that one, so if you just sync the vendor/ tree in Kubernetes to where this latest commit of mine is, you should be good to go... It would be nice to confirm that that's indeed fixed before we merge this!

Cheers,
Filipe

@cyphar
Copy link
Member

cyphar commented Apr 11, 2018

Starting with systemd 237, in preparation for cgroup v2, delegation is only now available for scopes, not slices.

Why on earth... cgroupv2 still isn't ready yet and they're already removing features from systemd and breaking things "just for fun" because "everyone should just switch to cgroupv2 -- ignore that important controllers are totally absent from cgroupv2, and systemd doesn't support setting up different paths for different controllers because of cgroupv2". Not to mention that the scope/slice distinction is entirely an invention of systemd.

All of that being said, if they've broken it then we will need this patch (once tested). Although I believe that adding Delegate=true will just result in a warning right -- it won't cause the slice creation to fail?

@filbranden
Copy link
Contributor Author

it won't cause the slice creation to fail?

I tested it with a small 15-line Go program creating a slice. It fails with "org.freedesktop.DBus.Error.InvalidArgs", you'll see my latest version of the commit includes that... So in a way it's already tested in that I created a slice on systemd v238 with it...

I'd still prefer to wait for @adelton to confirm that it does fix the issue in Kubernetes, just to make sure there's nothing else lurking there...

Cheers,
Filipe

@mrunalp
Copy link
Contributor

mrunalp commented Apr 11, 2018

@umohnani is also testing this out. Thanks!

@cgwalters
Copy link
Contributor

@filbranden
Copy link
Contributor Author

Awesome, thanks for testing! I think we're good to go... Merge?

@filbranden
Copy link
Contributor Author

Once this one is in, I'll update kubernetes/kubernetes#61926 to include everything including this PR to fix the issue in Kubernetes code base too.

Cheers!
Filipe

@umohnani8
Copy link

Works for me also!

@mrunalp
Copy link
Contributor

mrunalp commented Apr 11, 2018

LGTM

Approved with PullApprove

@runcom
Copy link
Member

runcom commented Apr 12, 2018

works here as well 👍

@crosbymichael
Copy link
Member

crosbymichael commented Apr 12, 2018

LGTM

Approved with PullApprove

@crosbymichael crosbymichael merged commit 59c60d1 into opencontainers:master Apr 12, 2018
filbranden added a commit to filbranden/kubernetes that referenced this pull request Apr 24, 2018
PR opencontainers/runc#1754 works around an issue in manager.Apply(-1) that
makes Kubelet startup hang when using systemd cgroup driver (by adding a
timeout) and further PR opencontainers/runc#1772 fixes that bug by
checking the proper error status before waiting on the channel.

PR opencontainers/runc#1776 checks whether Delegate works in slices,
which keeps libcontainer systemd cgroup driver working on systemd v237+.

PR opencontainers/runc#1781 makes the channel buffered, so if we time
out waiting on the channel, the updater will not block trying to it
since there are no longer any consumers.
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Apr 25, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update libcontainer to include PRs with fixes to systemd cgroup driver

**What this PR does / why we need it**:

PR opencontainers/runc#1754 works around an issue in manager.Apply(-1) that makes Kubelet startup hang when using systemd cgroup driver (by adding a timeout) and further PR opencontainers/runc#1772 fixes that bug by checking the proper error status before waiting on the channel.
    
PR opencontainers/runc#1776 checks whether Delegate works in slices, which keeps libcontainer systemd cgroup driver working on systemd v237+.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #61474

**Special notes for your reviewer**:
/assign @derekwaynecarr
cc @vikaschoudhary16 @sjenning @adelton @mrunalp 

**Release note**:

```release-note
NONE
```
filbranden added a commit to filbranden/kubernetes that referenced this pull request Apr 25, 2018
PR opencontainers/runc#1754 works around an issue in manager.Apply(-1) that
makes Kubelet startup hang when using systemd cgroup driver (by adding a
timeout) and further PR opencontainers/runc#1772 fixes that bug by
checking the proper error status before waiting on the channel.

PR opencontainers/runc#1776 checks whether Delegate works in slices,
which keeps libcontainer systemd cgroup driver working on systemd v237+.

PR opencontainers/runc#1781 makes the channel buffered, so if we time
out waiting on the channel, the updater will not block trying to it
since there are no longer any consumers.
@filbranden filbranden deleted the systemd2 branch February 7, 2019 01:46
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants