Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

[WIP] Refactor cgroup handling #416

Closed
wants to merge 2 commits into from
Closed

[WIP] Refactor cgroup handling #416

wants to merge 2 commits into from

Conversation

jingxiaolu
Copy link
Contributor

According to #344, this PR is trying to refactor cgroups handling in runtime.

What I've done in this PR:

  1. introduce libcontainer/cgroups package for cgroups handling;
  2. move cgroups handlings from CLI to sandbox level; refactor the handlings by the help of libcontainer/cgroups;

Works to be continue:

  1. Add tests as @jodh-intel mentioned;
  2. Add handling to UpdateContainer();
  3. Remove old cgroups handling codes

Fixes: #344

Signed-off-by: Jingxiao Lu lujingxiao@huawei.com

@jingxiaolu
Copy link
Contributor Author

This PR is re-submitted according to @sboeuf's comments at #405 .

@sboeuf @jodh-intel @grahamwhaley @devimc although is PR is still WIP, could you help to pay some time on it? Many thanks~

cc\ @WeiZhang555 @jshachm

@jingxiaolu
Copy link
Contributor Author

CI is failing, let me fix it first...

Introduce libcontainer/cgroups package for further cgroups handling refactor

Fixes: #344

Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>
Refactoring cgroups handling with runc/libcontainer/cgroups

Fixes: #344

Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>
@@ -120,8 +120,11 @@
name = "github.com/opencontainers/runc"
packages = [
"libcontainer/configs",
"libcontainer/cgroups",
"libcontainer/cgroups/fs",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "libcontainer/cgroups/systemd" not involved? Is it unnecessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support systemd containers? If yes, I'll add "libcontainer/cgroups/systemd"~

@@ -0,0 +1,111 @@
// +build linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove this comment and just rename the file to be virtcontainers/cgroups_linux.go as that's much clearer imho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, you need a new file named "virtcontainers/cgroups_unsupported.go" contains exactly same interface with current implementation(old codes) for other platforms other than linux. Or it won't compile on ppc64le

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can imitating

 vendor/github.com/opencontainers/runc/libcontainer/cgroups/cgroups_unsupported.go
@@ -0,0 +1,3 @@
 +// +build !linux
 +
 +package cgroups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted 👍

}

state, _ := s.storage.fetchSandboxState(s.id)
if state.CgroupPaths == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this code. Why can't it be just...

state, err := s.storage.fetchSandboxState(s.id)
if err != nil {
    return err
}

cgm.libcontainerManager = &fs.Manager{
    Cgroups:    cgm.libcontainerConfig.Cgroups,
    Paths:      state.CgroupPaths,
}

Maybe a comment in the code would help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted~ Code is more beautiful now~

But I think we can ignore the err of fetchSandboxState() here, because if state.CgroupPaths is nil, it means we are in the first container creation. That's why I check if state.CgroupPaths == nil.
I'll update as this, so the first container creation, state.CgroupsPaths is nil:

	state, _ := s.storage.fetchSandboxState(s.id)
	cgm.libcontainerManager = &fs.Manager{
		Cgroups: cgm.libcontainerConfig.Cgroups,
		Paths:   state.CgroupPaths,
	}

return fmt.Errorf("apply %d to host cgroups of sandbox %s failed with %s", shimPid, s.id, err)
}

if cgm.libcontainerConfig == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this test be the first one in the function (fail fast)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted~ 👍

}

// deleteSandbox cleanup cgroup folders
func (cgm *cgroupsManager) deleteSandbox(s *Sandbox) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this function should return an error as there are error scenarios it has to deal with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when deletion, we shouldn't return error to break the shutdown procedure, that's why I just report warning here~

@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 144409 KB
Proxy: 4602 KB
Shim: 9112 KB

Memory inside container:
Total Memory: 2045972 KB
Free Memory: 2006316 KB


// newManager setup cgroup manager for sandbox
func (cgm *cgroupsManager) newManager(s *Sandbox) error {
ociConfigStr, err := s.Annotations(annotations.ConfigJSONKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is s.Annotations() defined? I can't find it in the PR. Ideally we should only unmarshal the OCI json once for each container, since it is really slow... And for an empty sandbox w/o containers (e.g. in the CRI case), there is no such container OCI spec. You need to handle that case as well. So IMO the cgroup manager needs to be created upon first container creation instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should create this "cgroupManager" when the first container is created, but not the first sandbox is created?~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because you rely on a container OCI spec to create the cgroup manager and we won't have a container spec until the first container is to be created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the only place unmarshaling the OCI json is in kata-agent level, but I think newManager() should be called at sandbox level.

I would like to:

  1. add a pointer named ociSpec at Sandbox struct;
  2. unmarshal in newSandbox() and assign it to ociSpec;
  3. when createContainer() in kata-agent level, get it from ociSpec;

Please share your comments, thanks~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxiaolu

  1. Yes, a pointer to the first ocispec in Sandbox makes sense since it avoids unmarshalling the same json twice.

  2. As I stated above, one issue with newManager() in newSandbox() is that there may be no containers in sandboxConfig. Then you do not have the OCI spec you need to create the cgroup manager. The right place to do it is in createContainer().

  3. Yes, it makes sense.


// addContainer adding shim pid of container to sandbox's host cgroups
func (cgm *cgroupsManager) addContainer(c *Container) error {
shimPid := c.process.Pid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check for shimPid > 0 to exclude the builtin shim case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted~ 👍

when shimPid == 0, should we just return with nil or report error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shimPid == 0 happens in noop_shim case. You can just return nil IMO.


// addSandbox adding shim pid to host cgroups and set the resource limitation with cgroups
func (cgm *cgroupsManager) addSandbox(s *Sandbox) error {
shimPid := s.state.Pid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is s.state.Pid? A sandbox does not have a corresponding shim. All shims are associated with containers instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sandbox.state.pid is shim pid of the first container in the pod, in other word means pause container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, a sandbox can be empty in which case there is no container in it at all. Secondly you cannot assume the first container in a sandbox is always a pause container that never quits. Such assumption breaks with docker and frakti case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf reasonable~ I'm modifying~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry for missing for frakti case~

@@ -664,6 +664,10 @@ func createContainer(sandbox *Sandbox, contConfig ContainerConfig) (c *Container
sandbox.setSandboxPid(c.process.Pid)
}

if ann[annotations.ContainerTypeKey] == string(PodContainer) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodSandbox and PodContainer are both annotations in the kata CLI to get the missing sandbox abstraction from runc compatible command lines. There is no need to use such annotation in virtcontainers, where we know clearly about sandbox vs. containers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which means: once createContainer() is called, we're clearly creating a container, I don't need to check PodContainer or what, just add the container's pid to cgroup.
Am I clear?~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your understanding is correct.

@@ -463,6 +466,8 @@ type Sandbox struct {
wg *sync.WaitGroup

shmSize uint64

cgroups cgroupsManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cgroups *cgroupsManager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I adding this to enclose the implementations and data of cgroups handlings in cgroups.go (will be cgroups_linux.go and cgroups_unsupported.go)

@WeiZhang555
Copy link
Member

@jingxiaolu is currently too busy and can't get enough time on this. I'll carry his work and take over the process.

@jodh-intel
Copy link
Contributor

Thanks @WeiZhang555 - branch needs updating due to conflicts too btw.

@WeiZhang555 WeiZhang555 self-assigned this Aug 6, 2018
@bergwolf
Copy link
Member

@WeiZhang555 Any updates on this one?

@WeiZhang555
Copy link
Member

@bergwolf I'll update it soon, didn't get enough time on it in recent days.

@jodh-intel
Copy link
Contributor

Ping @WeiZhang555 :)

@sboeuf
Copy link

sboeuf commented Sep 10, 2018

@WeiZhang555 I bet you're very busy, but just checking if this is something you're planning to look at?
I'm asking because we could reassign this to someone else that might have more bandwidth if someone's interested.

@WeiZhang555
Copy link
Member

@sboeuf Sorry for delay, let me try to finish it in several days. It is truly blocked for so many months!
I'll give some update in this week. But if anyone is so interested in implementing this, welcome!

@sboeuf sboeuf added the enhancement Improvement to an existing feature label Sep 12, 2018
@WeiZhang555
Copy link
Member

Closing this. New implementation is #734

@WeiZhang555 WeiZhang555 mentioned this pull request Sep 15, 2018
zklei pushed a commit to zklei/runtime that referenced this pull request Jun 13, 2019
lifupan pushed a commit to lifupan/kata-runtime that referenced this pull request Aug 5, 2020
runtime: consolidate network types definition
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
enhancement Improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants