-
Notifications
You must be signed in to change notification settings - Fork 375
[WIP] Refactor cgroup handling #416
[WIP] Refactor cgroup handling #416
Conversation
This PR is re-submitted according to @sboeuf's comments at #405 . @sboeuf @jodh-intel @grahamwhaley @devimc although is PR is still WIP, could you help to pay some time on it? Many thanks~ cc\ @WeiZhang555 @jshachm |
CI is failing, let me fix it first... |
@@ -120,8 +120,11 @@ | |||
name = "github.com/opencontainers/runc" | |||
packages = [ | |||
"libcontainer/configs", | |||
"libcontainer/cgroups", | |||
"libcontainer/cgroups/fs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is "libcontainer/cgroups/systemd" not involved? Is it unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support systemd containers? If yes, I'll add "libcontainer/cgroups/systemd"~
@@ -0,0 +1,111 @@ | |||
// +build linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove this comment and just rename the file to be virtcontainers/cgroups_linux.go
as that's much clearer imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, you need a new file named "virtcontainers/cgroups_unsupported.go" contains exactly same interface with current implementation(old codes) for other platforms other than linux. Or it won't compile on ppc64le
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can imitating
vendor/github.com/opencontainers/runc/libcontainer/cgroups/cgroups_unsupported.go
@@ -0,0 +1,3 @@
+// +build !linux
+
+package cgroups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted 👍
} | ||
|
||
state, _ := s.storage.fetchSandboxState(s.id) | ||
if state.CgroupPaths == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this code. Why can't it be just...
state, err := s.storage.fetchSandboxState(s.id)
if err != nil {
return err
}
cgm.libcontainerManager = &fs.Manager{
Cgroups: cgm.libcontainerConfig.Cgroups,
Paths: state.CgroupPaths,
}
Maybe a comment in the code would help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted~ Code is more beautiful now~
But I think we can ignore the err
of fetchSandboxState()
here, because if state.CgroupPaths
is nil
, it means we are in the first container creation. That's why I check if state.CgroupPaths == nil
.
I'll update as this, so the first container creation, state.CgroupsPaths
is nil
:
state, _ := s.storage.fetchSandboxState(s.id)
cgm.libcontainerManager = &fs.Manager{
Cgroups: cgm.libcontainerConfig.Cgroups,
Paths: state.CgroupPaths,
}
return fmt.Errorf("apply %d to host cgroups of sandbox %s failed with %s", shimPid, s.id, err) | ||
} | ||
|
||
if cgm.libcontainerConfig == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this test be the first one in the function (fail fast)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted~ 👍
} | ||
|
||
// deleteSandbox cleanup cgroup folders | ||
func (cgm *cgroupsManager) deleteSandbox(s *Sandbox) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function should return an error
as there are error scenarios it has to deal with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when deletion, we shouldn't return error
to break the shutdown procedure, that's why I just report warning
here~
PSS Measurement: Memory inside container: |
|
||
// newManager setup cgroup manager for sandbox | ||
func (cgm *cgroupsManager) newManager(s *Sandbox) error { | ||
ociConfigStr, err := s.Annotations(annotations.ConfigJSONKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is s.Annotations()
defined? I can't find it in the PR. Ideally we should only unmarshal the OCI json once for each container, since it is really slow... And for an empty sandbox w/o containers (e.g. in the CRI case), there is no such container OCI spec. You need to handle that case as well. So IMO the cgroup manager needs to be created upon first container creation instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should create this "cgroupManager" when the first container is created, but not the first sandbox is created?~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because you rely on a container OCI spec to create the cgroup manager and we won't have a container spec until the first container is to be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the only place unmarshaling the OCI json is in kata-agent level
, but I think newManager()
should be called at sandbox level
.
I would like to:
- add a pointer named
ociSpec
atSandbox struct
; - unmarshal in
newSandbox()
and assign it toociSpec
; - when
createContainer()
inkata-agent
level, get it fromociSpec
;
Please share your comments, thanks~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yes, a pointer to the first ocispec in
Sandbox
makes sense since it avoids unmarshalling the same json twice. -
As I stated above, one issue with
newManager()
innewSandbox()
is that there may be no containers in sandboxConfig. Then you do not have the OCI spec you need to create the cgroup manager. The right place to do it is increateContainer()
. -
Yes, it makes sense.
|
||
// addContainer adding shim pid of container to sandbox's host cgroups | ||
func (cgm *cgroupsManager) addContainer(c *Container) error { | ||
shimPid := c.process.Pid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check for shimPid > 0
to exclude the builtin shim case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted~ 👍
when shimPid == 0
, should we just return with nil
or report error
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shimPid == 0
happens in noop_shim case. You can just return nil IMO.
|
||
// addSandbox adding shim pid to host cgroups and set the resource limitation with cgroups | ||
func (cgm *cgroupsManager) addSandbox(s *Sandbox) error { | ||
shimPid := s.state.Pid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is s.state.Pid
? A sandbox does not have a corresponding shim. All shims are associated with containers instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sandbox.state.pid
is shim pid of the first container in the pod, in other word means pause
container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, a sandbox can be empty in which case there is no container in it at all. Secondly you cannot assume the first container in a sandbox is always a pause
container that never quits. Such assumption breaks with docker and frakti case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf reasonable~ I'm modifying~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sry for missing for frakti case~
@@ -664,6 +664,10 @@ func createContainer(sandbox *Sandbox, contConfig ContainerConfig) (c *Container | |||
sandbox.setSandboxPid(c.process.Pid) | |||
} | |||
|
|||
if ann[annotations.ContainerTypeKey] == string(PodContainer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PodSandbox and PodContainer are both annotations in the kata CLI to get the missing sandbox abstraction from runc compatible command lines. There is no need to use such annotation in virtcontainers, where we know clearly about sandbox vs. containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which means: once createContainer()
is called, we're clearly creating a container, I don't need to check PodContainer
or what, just add the container's pid to cgroup.
Am I clear?~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your understanding is correct.
@@ -463,6 +466,8 @@ type Sandbox struct { | |||
wg *sync.WaitGroup | |||
|
|||
shmSize uint64 | |||
|
|||
cgroups cgroupsManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cgroups *cgroupsManager
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I adding this to enclose the implementations and data of cgroups handlings in cgroups.go
(will be cgroups_linux.go
and cgroups_unsupported.go
)
@jingxiaolu is currently too busy and can't get enough time on this. I'll carry his work and take over the process. |
Thanks @WeiZhang555 - branch needs updating due to conflicts too btw. |
@WeiZhang555 Any updates on this one? |
@bergwolf I'll update it soon, didn't get enough time on it in recent days. |
Ping @WeiZhang555 :) |
@WeiZhang555 I bet you're very busy, but just checking if this is something you're planning to look at? |
@sboeuf Sorry for delay, let me try to finish it in several days. It is truly blocked for so many months! |
Closing this. New implementation is #734 |
# Kata Containers 1.4.0
runtime: consolidate network types definition
According to #344, this PR is trying to refactor cgroups handling in
runtime
.What I've done in this PR:
libcontainer/cgroups
package for cgroups handling;libcontainer/cgroups
;Works to be continue:
UpdateContainer()
;Fixes: #344
Signed-off-by: Jingxiao Lu lujingxiao@huawei.com