-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Implemented clone fallback when clone3 returns ENOSYS #2121
Conversation
9f6f7ab
to
04c4800
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #2121 +/- ##
==========================================
+ Coverage 64.82% 64.87% +0.05%
==========================================
Files 129 129
Lines 14768 14954 +186
==========================================
+ Hits 9573 9702 +129
- Misses 5195 5252 +57 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, haven't gone through the all changes or conceptual changes, just had a couple of specific questions. If any changes are required, I'd suggest to keep them and bundle with any other changes from other review.
pub fn with_executor(mut self, executor: Executor) -> Result<Self, LibcontainerError> { | ||
self.executor = executor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still needs to return result? we are simply setting the executor and returning, so can simply return self instead of a result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// namespace. We also need to create to socket before entering into the | ||
// user namespace in the case that the path is located in paths only | ||
// root can access. | ||
let notify_listener = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor nitpick, but we can directly name this notify_socket
, so we don't have to specify when building ContainerArgs below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Clone)] | ||
pub struct Receiver<T> { | ||
receiver: RawFd, | ||
phantom: PhantomData<T>, | ||
} | ||
|
||
#[derive(Clone)] | ||
pub struct Sender<T> { | ||
sender: RawFd, | ||
phantom: PhantomData<T>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this contains RawFd, is it fine to allow clone on this? Also, should we impose Clone on T as well (even if we are not storing that) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clone RawFd
This is just a c_int, so this is safe to do. It is like pass the RawFd by value with copy. When across the fork/clone boundary, the file descriptor table is duplicated so existing fds works exactly the same.
T for Clone
I don't think we have to impose Clone
to T. In fact, it is possible that T does not have Clone trait. As long as T is serializable, we can use the channel to send and receive. So T can be serializable but not clone-able. The PhantomData
does not store T as you pointed out and it is simply there to signal ownership of T.
29e0bfc
to
b220267
Compare
I am leaving the commits for easier time to review, but I should probably squash before we merge. |
I can't find time for this PR unless it's on a Saturday or Sunday, so wait a minute 🙏 |
Also, some of the rust clone can be mitigated using |
implements a clone fallback path when `clone3` is blocked with ENOSYS Removed the use of clone3 crate. Our usecase for the clone3 crate is very specific and do not need to support everything. I can easily replace the clone3 crate with a few lines of code, so I did it in this PR. We usually favors safe syscall wrappers like nix over libc, but I believe the save is worth it here. We use the raw clone syscall instead of glibc wrapper. Details are in the comments. Using raw syscalls for both clone and clone3 actually makes the code well organized. For our usecase, they share the input types. Unit tests using seccomp profile to simulate clone3 being blocked are added. Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
Signed-off-by: yihuaf <yihuaf@unkies.org>
c10b27e
to
64c8133
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too big a change to mix in with this change. libcontainer has people using it. We need to contact them. Please leave this out of this PR 🙏
Remove the use the executor and use a function pointer instead. Now the executor is much more explicit about the order of how each executor is tried. Instead of a list of Executor, now executors are composable to create new executor.
/// Id of the container | ||
pub(super) container_id: String, | ||
/// Root directory for container state | ||
pub(super) root_path: PathBuf, | ||
/// Interface to operating system primitives | ||
pub(super) syscall: &'a dyn Syscall, | ||
pub(super) syscall: SyscallType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a breaking change for the users of libcontainer, please create another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make these changes into a series of PR so we can these changes out.
@utam0k maintaining backward compatibility is going to be hard and I don't think we have thought enough about the libcontainer api to support it. I hope there is a way to can make breaking changes to the interface. With that being said, I will break the change into smaller pieces and let see how to work through this. I do believe the importance of making this PR to go in as I have great concerns with us using clone3 syscall without libc directly as the only path. |
Can you give me a sense of who are the big consumers? I can help take a look and gauge the amount of effort to fix. |
Converting this PR to draft. I will break this PR into smaller piece and use this as a reference for the eventual intent. |
@yihuaf I don't mean to deny breaking change. If they are necessary, let's include them. However, I do think that breakrng changes that are not related to 1PR will cause confusion among users. Runwasi is probably the biggest user. Thanks for understanding🙇🙇 |
Understand. I will parse into multiple PRs and the destructive changes should be in their own PR. I will also fix |
Actually, since I will break this PR into multiple PRs, we can close this to avoid confusion. |
@yihuaf Thanks a lot🙏 |
Fix #2030
Apologize for the large change. This PR implements the clone fallback when clone3 returns ENOSYS.
clone
interface is messier to use compared to the clone3 kernel syscall, so a large chunk of the changes is to deal with that. For example, thecontainer_args
now implementsClone
so we can clearly pass the ownership of the memory to the new process.A few other changes to make this happen: