Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Add cleanup task for failed creates #404

Conversation

jamiehannaford
Copy link

Addresses a comment left by @sboeuf #396 (comment).

Not sure whether this is what you had in mind, but I've verified locally that dirs are now being cleaned up. I think a big reason kata-runtime list was hanging was due to the large amount of junk folders left around in /run/vc/sbs

Signed-off-by: Jamie Hannaford <jamie@limetree.org>
s.stop()
// unmount and delete mount dirs
bindUnmountAllRootfs(kataHostSharedDir, s)
os.RemoveAll(filepath.Join(kataHostSharedDir, s.id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kataHostSharedDir is specific to the kata_agent.go implementation, and should never be used by the generic sandbox implementation. And about this, maybe something has been solved by #292

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamiehannaford I agree with @jshachm that you cannot use Kata specific paths from sandbox.go, this is specific to the agent implementation.
Talking about #292, I have just merged it !

@grahamwhaley
Copy link
Contributor

Notes then:

  • dir cleanup is good - I think we are missing these bits. But, the hangup (also such as I see over at integration: soak: add rm soak test tests#414), I suspect is not because we left dirs laying around - that might slow things down (as it walks lots of dirs maybe), but it should not hang the runtime. I suspect there is a deeper rooted issue as well, which we may well be able to treat separately.

  • As this PR is asking for some input, and afaict does not quite conform to the PR and patch details found at https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md, I'm going to mark this RFC/WIP/DNM for the minute (@jamiehannaford - hope that is OK, and feel free to fix as/when ready).

  • possibly due to the above, the CIs are not happy (probably expected).

// Cleanup will clean up any dangling mounts created during a startup
// procedure, as well as terminate the VM. This is used when a sandbox
// failed during create.
func (s *Sandbox) Cleanup() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function does not need to be exported.

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jamiehannaford for the PR, I have some comments.

s.stop()
// unmount and delete mount dirs
bindUnmountAllRootfs(kataHostSharedDir, s)
os.RemoveAll(filepath.Join(kataHostSharedDir, s.id))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamiehannaford I agree with @jshachm that you cannot use Kata specific paths from sandbox.go, this is specific to the agent implementation.
Talking about #292, I have just merged it !

defer func() {
if err != nil {
s.Logger().WithError(err).WithField("sandboxid", s.id).Error("Creating sandbox failed")
s.Cleanup()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we can simply apply one big cleanup function because the failure could be caused by a bunch of different reasons, and using a hammer to cleanup might end up with some conflicts.
Trying to be more concrete here, let's say this function fails when creating the network, we don't want to call into s.stop() because this will try to communicate with the agent to stop the sandbox inside the VM, but the VM does not even exist at this point. That's why I think something with finer granularity would be more appropriate.

@grahamwhaley
Copy link
Contributor

I was just reading though this area of code (trying to figure out where my errant QEMU went...), and saw the teardown code. It occurred to me that we probably need some sort of 'teardown stack', and if we fail at some point we then call all the relevant teardown funcs in the reverse order they were added to the stack (so, yes, a real stack). I dunno if there is a nice idiom for doing this in golang?

@sboeuf
Copy link

sboeuf commented Jun 19, 2018

@grahamwhaley the way to do that in Golang is by using defer. Each defer is stacked up and at the end the stack is read.

@grahamwhaley
Copy link
Contributor

:-) I guess that works well if we have a single call chain for the whole setup. And we could use the if err != nil idiom inside the defer'd teardown funcs. defer didn't occur to me as I wasn't thinking this was a teardown through a single callchain, but maybe it is...

@egernst
Copy link
Member

egernst commented Jul 23, 2018

@jamiehannaford - I appreciate the contribution here - more graceful failing is an important area for improvement imo. I'm trying to scrub our backlog and reduce the number of open PRs in the project and see that this PR has gone quiet and is close to getting stale.

I see #351 similarly handles the error case. That's close to merge now -- can we re-evaluate the behavior once this merges? What do you think?

@jodh-intel
Copy link
Contributor

@jamiehannaford - #351 has now landed so would be good to get some input to @egernst's query above.

@jamiehannaford
Copy link
Author

Based on the feedback and other PR, I think we can close this. Thanks!

@jodh-intel
Copy link
Contributor

Thanks @jamiehannaford!

@jamiehannaford jamiehannaford deleted the add-sandbox-cleanup branch July 30, 2018 12:03
zklei pushed a commit to zklei/runtime that referenced this pull request Jun 13, 2019
network: Handle default route where gateway is empty
lifupan pushed a commit to lifupan/kata-runtime that referenced this pull request Aug 5, 2020
…itor_address-to-gitignore

runtime: add monitor_address to .gitignore
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants