Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implemented clone fallback when clone3 returns ENOSYS #2121

Closed
wants to merge 21 commits into from

Conversation

yihuaf
Copy link
Collaborator

@yihuaf yihuaf commented Jul 4, 2023

Fix #2030

Apologize for the large change. This PR implements the clone fallback when clone3 returns ENOSYS. clone interface is messier to use compared to the clone3 kernel syscall, so a large chunk of the changes is to deal with that. For example, the container_args now implements Clone so we can clearly pass the ownership of the memory to the new process.

A few other changes to make this happen:

  • Delay the creation of the syscall struct and only pass around a enum signaling the type of syscall.
  • Remove the use the executor and use a function pointer instead. Now the executor is much more explicit about the order of how each executor is tried. Instead of a list of Executor, now executors are composable to create new executor.
  • Delay the creation of cgroup manager until the intermediate process.
  • Changed some of the cgroup functions to take &Path instead of PathBuf.

@yihuaf yihuaf force-pushed the yihuaf/clone_syscall branch from 9f6f7ab to 04c4800 Compare July 4, 2023 07:31
@yihuaf yihuaf changed the title Implemented clone syscall Implemented clone fallback when clone3 returns ENOSYS Jul 4, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jul 4, 2023

Codecov Report

Merging #2121 (c10b27e) into main (71cb81a) will increase coverage by 0.05%.
The diff coverage is 46.84%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2121      +/-   ##
==========================================
+ Coverage   64.82%   64.87%   +0.05%     
==========================================
  Files         129      129              
  Lines       14768    14954     +186     
==========================================
+ Hits         9573     9702     +129     
- Misses       5195     5252      +57     

@yihuaf yihuaf marked this pull request as ready for review July 4, 2023 09:02
@yihuaf yihuaf requested review from utam0k and a team July 4, 2023 09:02
@yihuaf yihuaf self-assigned this Jul 4, 2023
Copy link
Collaborator

@YJDoc2 YJDoc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, haven't gone through the all changes or conceptual changes, just had a couple of specific questions. If any changes are required, I'd suggest to keep them and bundle with any other changes from other review.

Comment on lines 255 to 256
pub fn with_executor(mut self, executor: Executor) -> Result<Self, LibcontainerError> {
self.executor = executor;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still needs to return result? we are simply setting the executor and returning, so can simply return self instead of a result

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Good catch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// namespace. We also need to create to socket before entering into the
// user namespace in the case that the path is located in paths only
// root can access.
let notify_listener =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor nitpick, but we can directly name this notify_socket , so we don't have to specify when building ContainerArgs below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +21 to 30
#[derive(Clone)]
pub struct Receiver<T> {
receiver: RawFd,
phantom: PhantomData<T>,
}

#[derive(Clone)]
pub struct Sender<T> {
sender: RawFd,
phantom: PhantomData<T>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this contains RawFd, is it fine to allow clone on this? Also, should we impose Clone on T as well (even if we are not storing that) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clone RawFd

This is just a c_int, so this is safe to do. It is like pass the RawFd by value with copy. When across the fork/clone boundary, the file descriptor table is duplicated so existing fds works exactly the same.

T for Clone

I don't think we have to impose Clone to T. In fact, it is possible that T does not have Clone trait. As long as T is serializable, we can use the channel to send and receive. So T can be serializable but not clone-able. The PhantomData does not store T as you pointed out and it is simply there to signal ownership of T.

@yihuaf yihuaf force-pushed the yihuaf/clone_syscall branch from 29e0bfc to b220267 Compare July 4, 2023 17:52
@yihuaf yihuaf requested a review from YJDoc2 July 5, 2023 04:46
@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 5, 2023

I am leaving the commits for easier time to review, but I should probably squash before we merge.

@utam0k
Copy link
Member

utam0k commented Jul 5, 2023

I can't find time for this PR unless it's on a Saturday or Sunday, so wait a minute 🙏

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 5, 2023

Also, some of the rust clone can be mitigated using Rc, especially to manage the cost of duplicating structures. I am working on a follow up PR next to address the issue in case there is a concern on this.

yihuaf added 15 commits July 7, 2023 22:28
implements a clone fallback path when `clone3` is blocked with ENOSYS

Removed the use of clone3 crate. Our usecase for the clone3 crate is
very specific and do not need to support everything. I can easily
replace the clone3 crate with a few lines of code, so I did it in this
PR. We usually favors safe syscall wrappers like nix over libc, but I
believe the save is worth it here.  We use the raw clone syscall instead
of glibc wrapper. Details are in the comments.  Using raw syscalls for
both clone and clone3 actually makes the code well organized. For our
usecase, they share the input types.  Unit tests using seccomp profile
to simulate clone3 being blocked are added.

Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
yihuaf added 6 commits July 7, 2023 22:28
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
@yihuaf yihuaf force-pushed the yihuaf/clone_syscall branch from c10b27e to 64c8133 Compare July 8, 2023 05:29
Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too big a change to mix in with this change. libcontainer has people using it. We need to contact them. Please leave this out of this PR 🙏

Remove the use the executor and use a function pointer instead. Now the executor is much more explicit about the order of how each executor is tried. Instead of a list of Executor, now executors are composable to create new executor.

/// Id of the container
pub(super) container_id: String,
/// Root directory for container state
pub(super) root_path: PathBuf,
/// Interface to operating system primitives
pub(super) syscall: &'a dyn Syscall,
pub(super) syscall: SyscallType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a breaking change for the users of libcontainer, please create another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make these changes into a series of PR so we can these changes out.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 10, 2023

@utam0k maintaining backward compatibility is going to be hard and I don't think we have thought enough about the libcontainer api to support it. I hope there is a way to can make breaking changes to the interface. With that being said, I will break the change into smaller pieces and let see how to work through this. I do believe the importance of making this PR to go in as I have great concerns with us using clone3 syscall without libc directly as the only path.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 10, 2023

This is too big a change to mix in with this change. libcontainer has people using it. We need to contact them. Please leave this out of this PR 🙏

Remove the use the executor and use a function pointer instead. Now the executor is much more explicit about the order of how each executor is tried. Instead of a list of Executor, now executors are composable to create new executor.

Can you give me a sense of who are the big consumers? I can help take a look and gauge the amount of effort to fix.

@yihuaf yihuaf marked this pull request as draft July 10, 2023 20:42
@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 10, 2023

Converting this PR to draft. I will break this PR into smaller piece and use this as a reference for the eventual intent.

@utam0k
Copy link
Member

utam0k commented Jul 10, 2023

@yihuaf I don't mean to deny breaking change. If they are necessary, let's include them. However, I do think that breakrng changes that are not related to 1PR will cause confusion among users. Runwasi is probably the biggest user. Thanks for understanding🙇🙇

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 10, 2023

@yihuaf I don't mean to deny breaking change. If they are necessary, let's include them. However, I do think that breakrng changes that are not related to 1PR will cause confusion among users. Runwasi is probably the biggest user. Thanks for understanding🙇🙇

Understand. I will parse into multiple PRs and the destructive changes should be in their own PR. I will also fix runwasi when I introduce the executor change.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 10, 2023

Actually, since I will break this PR into multiple PRs, we can close this to avoid confusion.

@yihuaf yihuaf closed this Jul 10, 2023
@utam0k
Copy link
Member

utam0k commented Jul 10, 2023

@yihuaf Thanks a lot🙏

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Implement a fallback when clone3 returns ENOSYS
4 participants