[WIP] Refactor cgroup handling #416

jingxiaolu · 2018-06-19T12:40:17Z

According to #344, this PR is trying to refactor cgroups handling in runtime.

What I've done in this PR:

introduce libcontainer/cgroups package for cgroups handling;
move cgroups handlings from CLI to sandbox level; refactor the handlings by the help of libcontainer/cgroups;

Works to be continue:

Add tests as @jodh-intel mentioned;
Add handling to UpdateContainer();
Remove old cgroups handling codes

Fixes: #344

Signed-off-by: Jingxiao Lu lujingxiao@huawei.com

jingxiaolu · 2018-06-19T12:44:51Z

This PR is re-submitted according to @sboeuf's comments at #405 .

@sboeuf @jodh-intel @grahamwhaley @devimc although is PR is still WIP, could you help to pay some time on it? Many thanks~

cc\ @WeiZhang555 @jshachm

jingxiaolu · 2018-06-19T13:08:06Z

CI is failing, let me fix it first...

Introduce libcontainer/cgroups package for further cgroups handling refactor Fixes: #344 Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>

Refactoring cgroups handling with runc/libcontainer/cgroups Fixes: #344 Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>

WeiZhang555 · 2018-06-19T13:31:06Z

Gopkg.lock

@@ -120,8 +120,11 @@
  name = "github.com/opencontainers/runc"
  packages = [
    "libcontainer/configs",
+    "libcontainer/cgroups",
+    "libcontainer/cgroups/fs",


Why is "libcontainer/cgroups/systemd" not involved? Is it unnecessary?

Should we support systemd containers? If yes, I'll add "libcontainer/cgroups/systemd"~

jodh-intel · 2018-06-19T13:11:38Z

virtcontainers/cgroups.go

@@ -0,0 +1,111 @@
+// +build linux


Could you remove this comment and just rename the file to be virtcontainers/cgroups_linux.go as that's much clearer imho.

And, you need a new file named "virtcontainers/cgroups_unsupported.go" contains exactly same interface with current implementation(old codes) for other platforms other than linux. Or it won't compile on ppc64le

You can imitating

vendor/github.com/opencontainers/runc/libcontainer/cgroups/cgroups_unsupported.go @@ -0,0 +1,3 @@ +// +build !linux + +package cgroups

Accepted 👍

jodh-intel · 2018-06-19T13:19:02Z

virtcontainers/cgroups.go

+	}
+
+	state, _ := s.storage.fetchSandboxState(s.id)
+	if state.CgroupPaths == nil {


I don't really understand this code. Why can't it be just...

state, err := s.storage.fetchSandboxState(s.id) if err != nil { return err } cgm.libcontainerManager = &fs.Manager{ Cgroups: cgm.libcontainerConfig.Cgroups, Paths: state.CgroupPaths, }

Maybe a comment in the code would help here.

Accepted~ Code is more beautiful now~

But I think we can ignore the err of fetchSandboxState() here, because if state.CgroupPaths is nil, it means we are in the first container creation. That's why I check if state.CgroupPaths == nil.
I'll update as this, so the first container creation, state.CgroupsPaths is nil:

state, _ := s.storage.fetchSandboxState(s.id) cgm.libcontainerManager = &fs.Manager{ Cgroups: cgm.libcontainerConfig.Cgroups, Paths: state.CgroupPaths, }

jodh-intel · 2018-06-19T13:20:37Z

virtcontainers/cgroups.go

+		return fmt.Errorf("apply %d to host cgroups of sandbox %s failed with %s", shimPid, s.id, err)
+	}
+
+	if cgm.libcontainerConfig == nil {


Shouldn't this test be the first one in the function (fail fast)?

accepted~ 👍

jodh-intel · 2018-06-19T13:22:53Z

virtcontainers/cgroups.go

+}
+
+// deleteSandbox cleanup cgroup folders
+func (cgm *cgroupsManager) deleteSandbox(s *Sandbox) {


It looks like this function should return an error as there are error scenarios it has to deal with?

I think when deletion, we shouldn't return error to break the shutdown procedure, that's why I just report warning here~

katacontainersbot · 2018-06-19T13:42:19Z

PSS Measurement:
Qemu: 144409 KB
Proxy: 4602 KB
Shim: 9112 KB

Memory inside container:
Total Memory: 2045972 KB
Free Memory: 2006316 KB

bergwolf · 2018-06-19T14:37:40Z

virtcontainers/cgroups.go

+
+// newManager setup cgroup manager for sandbox
+func (cgm *cgroupsManager) newManager(s *Sandbox) error {
+	ociConfigStr, err := s.Annotations(annotations.ConfigJSONKey)


Where is s.Annotations() defined? I can't find it in the PR. Ideally we should only unmarshal the OCI json once for each container, since it is really slow... And for an empty sandbox w/o containers (e.g. in the CRI case), there is no such container OCI spec. You need to handle that case as well. So IMO the cgroup manager needs to be created upon first container creation instead.

So we should create this "cgroupManager" when the first container is created, but not the first sandbox is created?~

Yes, because you rely on a container OCI spec to create the cgroup manager and we won't have a container spec until the first container is to be created.

Currently the only place unmarshaling the OCI json is in kata-agent level, but I think newManager() should be called at sandbox level.

I would like to:

add a pointer named ociSpec at Sandbox struct;

unmarshal in newSandbox() and assign it to ociSpec;

when createContainer() in kata-agent level, get it from ociSpec;

Please share your comments, thanks~

@jingxiaolu

Yes, a pointer to the first ocispec in Sandbox makes sense since it avoids unmarshalling the same json twice.

As I stated above, one issue with newManager() in newSandbox() is that there may be no containers in sandboxConfig. Then you do not have the OCI spec you need to create the cgroup manager. The right place to do it is in createContainer().

Yes, it makes sense.

bergwolf · 2018-06-19T14:39:41Z

virtcontainers/cgroups.go

+
+// addContainer adding shim pid of container to sandbox's host cgroups
+func (cgm *cgroupsManager) addContainer(c *Container) error {
+	shimPid := c.process.Pid


Need to check for shimPid > 0 to exclude the builtin shim case.

accepted~ 👍

when shimPid == 0, should we just return with nil or report error?

shimPid == 0 happens in noop_shim case. You can just return nil IMO.

bergwolf · 2018-06-19T14:42:34Z

virtcontainers/cgroups.go

+
+// addSandbox adding shim pid to host cgroups and set the resource limitation with cgroups
+func (cgm *cgroupsManager) addSandbox(s *Sandbox) error {
+	shimPid := s.state.Pid


What is s.state.Pid? A sandbox does not have a corresponding shim. All shims are associated with containers instead.

sandbox.state.pid is shim pid of the first container in the pod, in other word means pause container.

First, a sandbox can be empty in which case there is no container in it at all. Secondly you cannot assume the first container in a sandbox is always a pause container that never quits. Such assumption breaks with docker and frakti case.

@bergwolf reasonable~ I'm modifying~

Sry for missing for frakti case~

bergwolf · 2018-06-19T14:46:22Z

virtcontainers/container.go

@@ -664,6 +664,10 @@ func createContainer(sandbox *Sandbox, contConfig ContainerConfig) (c *Container
 		sandbox.setSandboxPid(c.process.Pid)
 	}

+	if ann[annotations.ContainerTypeKey] == string(PodContainer) {


PodSandbox and PodContainer are both annotations in the kata CLI to get the missing sandbox abstraction from runc compatible command lines. There is no need to use such annotation in virtcontainers, where we know clearly about sandbox vs. containers.

which means: once createContainer() is called, we're clearly creating a container, I don't need to check PodContainer or what, just add the container's pid to cgroup.
Am I clear?~

Yes, your understanding is correct.

bergwolf · 2018-06-19T14:48:26Z

virtcontainers/sandbox.go

@@ -463,6 +466,8 @@ type Sandbox struct {
 	wg *sync.WaitGroup

 	shmSize uint64
+
+	cgroups cgroupsManager


cgroups *cgroupsManager?

Yes, I adding this to enclose the implementations and data of cgroups handlings in cgroups.go (will be cgroups_linux.go and cgroups_unsupported.go)

WeiZhang555 · 2018-07-24T02:01:08Z

@jingxiaolu is currently too busy and can't get enough time on this. I'll carry his work and take over the process.

jodh-intel · 2018-07-30T10:22:38Z

Thanks @WeiZhang555 - branch needs updating due to conflicts too btw.

bergwolf · 2018-08-14T07:04:30Z

@WeiZhang555 Any updates on this one?

WeiZhang555 · 2018-08-15T06:22:30Z

@bergwolf I'll update it soon, didn't get enough time on it in recent days.

jodh-intel · 2018-08-28T13:49:08Z

Ping @WeiZhang555 :)

sboeuf · 2018-09-10T17:30:50Z

@WeiZhang555 I bet you're very busy, but just checking if this is something you're planning to look at?
I'm asking because we could reassign this to someone else that might have more bandwidth if someone's interested.

WeiZhang555 · 2018-09-12T09:10:36Z

@sboeuf Sorry for delay, let me try to finish it in several days. It is truly blocked for so many months!
I'll give some update in this week. But if anyone is so interested in implementing this, welcome!

WeiZhang555 · 2018-09-15T07:13:03Z

Closing this. New implementation is #734

# Kata Containers 1.4.0

runtime: consolidate network types definition

jingxiaolu added 2 commits June 19, 2018 21:16

vendor: introduce libcontainer/cgroups package

60405e7

Introduce libcontainer/cgroups package for further cgroups handling refactor Fixes: #344 Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>

virtcontainers: refactor cgroups handling

301f5a2

Refactoring cgroups handling with runc/libcontainer/cgroups Fixes: #344 Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>

WeiZhang555 reviewed Jun 19, 2018

View reviewed changes

jodh-intel reviewed Jun 19, 2018

View reviewed changes

bergwolf reviewed Jun 19, 2018

View reviewed changes

jshachm mentioned this pull request Jun 20, 2018

fix container fail to start #331

Closed

WeiZhang555 self-assigned this Aug 6, 2018

sboeuf added the enhancement Improvement to an existing feature label Sep 12, 2018

WeiZhang555 mentioned this pull request Sep 15, 2018

Add cgroup support #734

Merged

WeiZhang555 closed this Sep 15, 2018

zklei pushed a commit to zklei/runtime that referenced this pull request Jun 13, 2019

Merge pull request kata-containers#416 from bergwolf/1.4.0-branch-bump

0ff3006

# Kata Containers 1.4.0

lifupan pushed a commit to lifupan/kata-runtime that referenced this pull request Aug 5, 2020

Merge pull request kata-containers#416 from bergwolf/cleanup

c052e46

runtime: consolidate network types definition

[WIP] Refactor cgroup handling #416

[WIP] Refactor cgroup handling #416

Conversation

jingxiaolu commented Jun 19, 2018

jingxiaolu commented Jun 19, 2018

jingxiaolu commented Jun 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katacontainersbot commented Jun 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WeiZhang555 commented Jul 24, 2018

jodh-intel commented Jul 30, 2018

bergwolf commented Aug 14, 2018

WeiZhang555 commented Aug 15, 2018

jodh-intel commented Aug 28, 2018

sboeuf commented Sep 10, 2018

WeiZhang555 commented Sep 12, 2018

WeiZhang555 commented Sep 15, 2018