-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fully support OwnedFd #2417
Comments
Some ideas |
Hi, can i try this issue?? |
Don't worry ;)
|
@anti-entropy123 what should happen if say, when manually implementing the Clone trait for impl Clone for ContainerType {
fn clone(&self) -> Self {
match self {
Self::InitContainer => Self::InitContainer,
Self::TenantContainer { exec_notify_fd } => Self::TenantContainer { exec_notify_fd: exec_notify_fd.try_clone().unwrap() },
}
}
} is unwrapping fine? |
i have a question. lets say i let a = f.try_clone().unwrap();
let b = f.try_clone().unwrap(); what if one of those variables is dropped? will the file close? |
does it make sense to do this maybe? pub enum ContainerType {
InitContainer,
TenantContainer { exec_notify_fd: Rc<OwnedFd> },
} |
Directly unwrapping indeed doesn't seem quite appropriate... Could we just implement a This suggestion might not be fully matured, but I hope it helps you. @zahash |
@anti-entropy123 what about this issue then? |
I think closing one of the file descriptors won't affect the usability of the other one. |
but the #[stable(feature = "io_safety", since = "1.63.0")]
impl Drop for OwnedFd {
#[inline]
fn drop(&mut self) {
unsafe {
// Note that errors are ignored when closing a file descriptor. The
// reason for this is that if an error occurs we don't actually know if
// the file descriptor was closed or not, and if we retried (for
// something like EINTR), we might close another valid file descriptor
// opened after we closed ours.
#[cfg(not(target_os = "hermit"))]
let _ = libc::close(self.fd);
#[cfg(target_os = "hermit")]
let _ = hermit_abi::close(self.fd);
}
}
} |
@zahash Sorry, I don't have a suitable Linux machine with me right now. I can try verifying the issue tomorrow morning.
I've discussed this idea before; you can refer to this link for more information: #2369 (comment) |
@anti-entropy123 @utam0k does |
also, mmap no longer accepts but now, i can't find a way to create a invalid Fd mman::mmap(
None,
NonZeroUsize::new(default_stack_size).ok_or(CloneError::ZeroStackSize)?,
mman::ProtFlags::PROT_READ | mman::ProtFlags::PROT_WRITE,
mman::MapFlags::MAP_PRIVATE | mman::MapFlags::MAP_ANONYMOUS | mman::MapFlags::MAP_STACK,
OwnedFd::from_raw_fd(-1), // used to be None
0,
)
.map_err(CloneError::StackAllocation)? so i decided to be a little naughty and use is this fine?
|
@zahash I think we can try using |
I tested this code snippet, and it ran fine. fn verify_try_clone(fd: OwnedFd) {
let new_fd = fd.try_clone().unwrap();
drop(fd);
let mut new_file = unsafe { File::from_raw_fd(new_fd.as_raw_fd()) };
new_file.write_all("hello world".as_bytes()).unwrap();
} |
@zahash The try_clone method of OwnedFd ultimately calls a syscall similar to dup2, so in the end, there are indeed two separate file descriptors. https://man7.org/linux/man-pages/man2/dup.2.html |
i was also thinking about having a ref datatype pub enum ContainerTypeRef<'fd> {
InitContainer,
TenantContainer { exec_notify_id: BorrowedFd<'fd> }
} that way, we may not have to necessarily clone all the time. but the let cb: CloneCb = {
let args = args.clone();
...
}; |
i have made changes related to issues #2417 and #2723 all the tests pass. Please checkout the patch file. if you like where this is going and satisfied with most of the changes, i will create a pull request for further review. diff --git a/crates/libcgroups/src/v1/memory.rs b/crates/libcgroups/src/v1/memory.rs
index 98fa6b1e..38b0dd07 100644
--- a/crates/libcgroups/src/v1/memory.rs
+++ b/crates/libcgroups/src/v1/memory.rs
@@ -332,7 +332,7 @@ impl Memory {
Err(e) => {
// we need to look into the raw OS error for an EBUSY status
match e.inner().raw_os_error() {
- Some(code) => match Errno::from_i32(code) {
+ Some(code) => match Errno::from_raw(code) {
Errno::EBUSY => {
let usage = Self::get_memory_usage(cgroup_root)?;
let max_usage = Self::get_memory_max_usage(cgroup_root)?;
diff --git a/crates/libcontainer/src/container/builder_impl.rs b/crates/libcontainer/src/container/builder_impl.rs
index 38f2f1cc..59d7a062 100644
--- a/crates/libcontainer/src/container/builder_impl.rs
+++ b/crates/libcontainer/src/container/builder_impl.rs
@@ -132,7 +132,7 @@ impl ContainerBuilderImpl {
prctl::set_dumpable(false).map_err(|e| {
LibcontainerError::Other(format!(
"error in setting dumpable to false : {}",
- nix::errno::from_i32(e)
+ nix::errno::Errno::from_raw(e)
))
})?;
}
@@ -141,7 +141,7 @@ impl ContainerBuilderImpl {
// therefore we will have to move all the variable by value. Since self
// is a shared reference, we have to clone these variables here.
let container_args = ContainerArgs {
- container_type: self.container_type,
+ container_type: self.container_type.clone(),
syscall: self.syscall,
spec: Rc::clone(&self.spec),
rootfs: self.rootfs.to_owned(),
diff --git a/crates/libcontainer/src/container/tenant_builder.rs b/crates/libcontainer/src/container/tenant_builder.rs
index 0d805a0d..2c2dbc88 100644
--- a/crates/libcontainer/src/container/tenant_builder.rs
+++ b/crates/libcontainer/src/container/tenant_builder.rs
@@ -1,6 +1,6 @@
use caps::Capability;
use nix::fcntl::OFlag;
-use nix::unistd::{self, close, pipe2, read, Pid};
+use nix::unistd::{self, pipe2, read, Pid};
use oci_spec::runtime::{
Capabilities as SpecCapabilities, Capability as SpecCapability, LinuxBuilder,
LinuxCapabilities, LinuxCapabilitiesBuilder, LinuxNamespace, LinuxNamespaceBuilder,
@@ -8,6 +8,7 @@ use oci_spec::runtime::{
};
use procfs::process::Namespace;
+use std::os::fd::AsRawFd;
use std::rc::Rc;
use std::{
collections::HashMap,
@@ -148,13 +149,13 @@ impl TenantContainerBuilder {
let mut notify_socket = NotifySocket::new(notify_path);
notify_socket.notify_container_start()?;
- close(write_end).map_err(LibcontainerError::OtherSyscall)?;
+ drop(builder_impl); // this will close write_end fd
let mut err_str_buf = Vec::new();
loop {
let mut buf = [0; 3];
- match read(read_end, &mut buf).map_err(LibcontainerError::OtherSyscall)? {
+ match read(read_end.as_raw_fd(), &mut buf).map_err(LibcontainerError::OtherSyscall)? {
0 => {
if err_str_buf.is_empty() {
return Ok(pid);
diff --git a/crates/libcontainer/src/process/args.rs b/crates/libcontainer/src/process/args.rs
index ad37c07b..11c5d7f2 100644
--- a/crates/libcontainer/src/process/args.rs
+++ b/crates/libcontainer/src/process/args.rs
@@ -1,5 +1,6 @@
use libcgroups::common::CgroupConfig;
use oci_spec::runtime::Spec;
+use std::os::fd::OwnedFd;
use std::os::unix::prelude::RawFd;
use std::path::PathBuf;
use std::rc::Rc;
@@ -9,10 +10,21 @@ use crate::notify_socket::NotifyListener;
use crate::syscall::syscall::SyscallType;
use crate::user_ns::UserNamespaceConfig;
use crate::workload::Executor;
-#[derive(Debug, Copy, Clone)]
+#[derive(Debug)]
pub enum ContainerType {
InitContainer,
- TenantContainer { exec_notify_fd: RawFd },
+ TenantContainer { exec_notify_fd: OwnedFd },
+}
+
+impl Clone for ContainerType {
+ fn clone(&self) -> Self {
+ match self {
+ Self::InitContainer => Self::InitContainer,
+ Self::TenantContainer { exec_notify_fd } => Self::TenantContainer {
+ exec_notify_fd: exec_notify_fd.try_clone().unwrap(/* is unwrap ok? */),
+ },
+ }
+ }
}
#[derive(Clone)]
diff --git a/crates/libcontainer/src/process/container_intermediate_process.rs b/crates/libcontainer/src/process/container_intermediate_process.rs
index 0085416b..7e5ac9c0 100644
--- a/crates/libcontainer/src/process/container_intermediate_process.rs
+++ b/crates/libcontainer/src/process/container_intermediate_process.rs
@@ -1,3 +1,5 @@
+use std::os::fd::AsRawFd;
+
use crate::error::MissingSpecError;
use crate::{namespaces::Namespaces, process::channel, process::fork};
use libcgroups::common::CgroupManager;
@@ -128,12 +130,12 @@ pub fn container_intermediate_process(
if let Err(err) = main_sender.exec_failed(e.to_string()) {
tracing::error!(?err, "failed sending error to main sender");
}
- if let ContainerType::TenantContainer { exec_notify_fd } = args.container_type {
+ if let ContainerType::TenantContainer { exec_notify_fd } = &args.container_type {
let buf = format!("{e}");
- if let Err(err) = write(exec_notify_fd, buf.as_bytes()) {
+ if let Err(err) = write(&exec_notify_fd, buf.as_bytes()) {
tracing::error!(?err, "failed to write to exec notify fd");
}
- if let Err(err) = close(exec_notify_fd) {
+ if let Err(err) = close(exec_notify_fd.as_raw_fd()) {
tracing::error!(?err, "failed to close exec notify fd");
}
}
@@ -157,8 +159,8 @@ pub fn container_intermediate_process(
})?;
// Close the exec_notify_fd in this process
- if let ContainerType::TenantContainer { exec_notify_fd } = args.container_type {
- close(exec_notify_fd).map_err(|err| {
+ if let ContainerType::TenantContainer { exec_notify_fd } = &args.container_type {
+ close(exec_notify_fd.as_raw_fd()).map_err(|err| {
tracing::error!("failed to close exec notify fd: {}", err);
IntermediateProcessError::ExecNotify(err)
})?;
@@ -206,7 +208,7 @@ fn setup_userns(
prctl::set_dumpable(true).map_err(|e| {
IntermediateProcessError::Other(format!(
"error in setting dumpable to true : {}",
- nix::errno::from_i32(e)
+ nix::errno::Errno::from_raw(e)
))
})?;
sender.identifier_mapping_request().map_err(|err| {
@@ -220,7 +222,7 @@ fn setup_userns(
prctl::set_dumpable(false).map_err(|e| {
IntermediateProcessError::Other(format!(
"error in setting dumplable to false : {}",
- nix::errno::from_i32(e)
+ nix::errno::Errno::from_raw(e)
))
})?;
Ok(())
diff --git a/crates/libcontainer/src/process/fork.rs b/crates/libcontainer/src/process/fork.rs
index c0e05539..60a33f46 100644
--- a/crates/libcontainer/src/process/fork.rs
+++ b/crates/libcontainer/src/process/fork.rs
@@ -1,4 +1,4 @@
-use std::{ffi::c_int, fs::File, num::NonZeroUsize};
+use std::{ffi::c_int, num::NonZeroUsize};
use libc::SIGCHLD;
use nix::{
@@ -164,15 +164,17 @@ fn clone(cb: CloneCb, flags: u64, exit_signal: Option<u64>) -> Result<Pid, Clone
// do not use MAP_GROWSDOWN since it is not well supported.
// Ref: https://man7.org/linux/man-pages/man2/mmap.2.html
let child_stack = unsafe {
- // Since nix = "0.27.1", `mmap()` requires a generic type `F: AsFd`.
- // `::<File>` doesn't have any meaning because we won't use it.
- mman::mmap::<File>(
+ // Since nix = "0.27.1", `mmap()` requires a generic type `F: AsFd` and takes f: Option<F>.
+ // if f: None, then fd = -1 is used; (-1 is just an invalid fd)
+ // let fd = f.map(|f| f.as_fd().as_raw_fd()).unwrap_or(-1);
+ // Since nix = "0.28.0" mmap function accepts f: F instead of Option<F>
+ // but it is not possible to manually create OwnedFd::from_raw_fd(-1) because rust will panic due to assert
+ // So, mmap_anonymous is used which achieves the exact same behaviour
+ mman::mmap_anonymous(
None,
NonZeroUsize::new(default_stack_size).ok_or(CloneError::ZeroStackSize)?,
mman::ProtFlags::PROT_READ | mman::ProtFlags::PROT_WRITE,
mman::MapFlags::MAP_PRIVATE | mman::MapFlags::MAP_ANONYMOUS | mman::MapFlags::MAP_STACK,
- None,
- 0,
)
.map_err(CloneError::StackAllocation)?
};
@@ -187,7 +189,7 @@ fn clone(cb: CloneCb, flags: u64, exit_signal: Option<u64>) -> Result<Pid, Clone
// Since the child stack for clone grows downward, we need to pass in
// the top of the stack address.
- let child_stack_top = unsafe { child_stack.add(default_stack_size) };
+ let child_stack_top = unsafe { child_stack.as_ptr().add(default_stack_size) };
// Combine the clone flags with exit signals.
let combined_flags = (flags | exit_signal.unwrap_or(0)) as c_int;
diff --git a/crates/libcontainer/src/seccomp/mod.rs b/crates/libcontainer/src/seccomp/mod.rs
index 2bfce4cb..a1acb131 100644
--- a/crates/libcontainer/src/seccomp/mod.rs
+++ b/crates/libcontainer/src/seccomp/mod.rs
@@ -332,7 +332,7 @@ mod tests {
}
if let Some(errno) = ret.err() {
- if errno != nix::errno::from_i32(expect_error) {
+ if errno != nix::errno::Errno::from_raw(expect_error) {
Err(TestCallbackError::Custom(format!(
"getcwd failed but we didn't get the expected error from seccomp profile: {}",
errno
diff --git a/crates/libcontainer/src/syscall/linux.rs b/crates/libcontainer/src/syscall/linux.rs
index d01c4a54..35e5430c 100644
--- a/crates/libcontainer/src/syscall/linux.rs
+++ b/crates/libcontainer/src/syscall/linux.rs
@@ -314,7 +314,7 @@ impl Syscall for LinuxSyscall {
fn set_id(&self, uid: Uid, gid: Gid) -> Result<()> {
prctl::set_keep_capabilities(true).map_err(|errno| {
tracing::error!(?errno, "failed to set keep capabilities to true");
- nix::errno::from_i32(errno)
+ nix::errno::Errno::from_raw(errno)
})?;
// args : real *id, effective *id, saved set *id respectively
@@ -350,7 +350,7 @@ impl Syscall for LinuxSyscall {
}
prctl::set_keep_capabilities(false).map_err(|errno| {
tracing::error!(?errno, "failed to set keep capabilities to false");
- nix::errno::from_i32(errno)
+ nix::errno::Errno::from_raw(errno)
})?;
Ok(())
}
diff --git a/tests/contest/contest/src/tests/seccomp_notify/seccomp_agent.rs b/tests/contest/contest/src/tests/seccomp_notify/seccomp_agent.rs
index 10664b2e..60337c4d 100644
--- a/tests/contest/contest/src/tests/seccomp_notify/seccomp_agent.rs
+++ b/tests/contest/contest/src/tests/seccomp_notify/seccomp_agent.rs
@@ -1,7 +1,7 @@
use anyhow::{bail, Context, Result};
use libcontainer::container::ContainerProcessState;
use nix::{
- sys::socket::{self, UnixAddr},
+ sys::socket::{self, Backlog, UnixAddr},
unistd,
};
use std::{
@@ -35,7 +35,11 @@ pub fn recv_seccomp_listener(seccomp_listener: &Path) -> SeccompAgentResult {
socket::bind(socket.as_raw_fd(), &addr).context("failed to bind to seccomp listener socket")?;
// Force the backlog to be 1 so in the case of an error, only one connection
// from clients will be waiting.
- socket::listen(&socket.as_fd(), 1).context("failed to listen on seccomp listener")?;
+ socket::listen(
+ &socket.as_fd(),
+ Backlog::new(1).unwrap( /* safe to unwrap because Backlog(1) is always valid */ ),
+ )
+ .context("failed to listen on seccomp listener")?;
let conn = match socket::accept(socket.as_raw_fd()) {
Ok(conn) => conn,
Err(e) => {
diff --git a/tests/contest/runtimetest/src/tests.rs b/tests/contest/runtimetest/src/tests.rs
index dee3afc7..d1d2f2ce 100644
--- a/tests/contest/runtimetest/src/tests.rs
+++ b/tests/contest/runtimetest/src/tests.rs
@@ -34,7 +34,7 @@ pub fn validate_readonly_paths(spec: &Spec) {
// change manual matching of i32 to e.kind() and match statement
for path in ro_paths {
if let std::io::Result::Err(e) = test_read_access(path) {
- let errno = Errno::from_i32(e.raw_os_error().unwrap());
+ let errno = Errno::from_raw(e.raw_os_error().unwrap());
// In the integration tests we test for both existing and non-existing readonly paths
// to be specified in the spec, so we allow ENOENT here
if errno == Errno::ENOENT {
@@ -50,7 +50,7 @@ pub fn validate_readonly_paths(spec: &Spec) {
}
if let std::io::Result::Err(e) = test_write_access(path) {
- let errno = Errno::from_i32(e.raw_os_error().unwrap());
+ let errno = Errno::from_raw(e.raw_os_error().unwrap());
// In the integration tests we test for both existing and non-existing readonly paths
// being specified in the spec, so we allow ENOENT, and we expect EROFS as the paths
// should be read-only |
@YJDoc2 oh, didn't realize there was already a pr to update nix. Fine then i will wait till it gets merged and then continue my work on the "Fully support OwnedFd" part |
also, @utam0k can i also be assigned to this issue? |
@zahash, have you considered the possibility of a FD leak here? container_main_process.rs pub fn container_main_process(container_args: &ContainerArgs) -> Result<(Pid, bool)> {
...
let cb: CloneCb = {
let container_args = container_args.clone();
...
Box::new(move || {
...
})
};
...
let intermediate_pid = fork::container_clone(cb) fork.rs fn clone(cb: CloneCb, flags: u64, exit_signal: Option<u64>) -> Result<Pid, CloneError> {
...
extern "C" fn main(data: *mut libc::c_void) -> libc::c_int {
unsafe { Box::from_raw(data as *mut CloneCb)() }
}
let ret = unsafe {
libc::clone(
main,
child_stack_top,
combined_flags,
data as *mut libc::c_void,
)
};
... It seems the child process doesn't have a chance to drop the OwnedFd outside of CloneCb. 🤔 (In the past, this wasn't a problem because they were all the same RawFd.) |
Also, cloning a OwnedFd only fails on wasm. impl OwnedFd {
/// Creates a new `OwnedFd` instance that shares the same underlying file
/// description as the existing `OwnedFd` instance.
#[stable(feature = "io_safety", since = "1.63.0")]
pub fn try_clone(&self) -> crate::io::Result<Self> {
self.as_fd().try_clone_to_owned()
}
}
impl BorrowedFd<'_> {
/// Creates a new `OwnedFd` instance that shares the same underlying file
/// description as the existing `BorrowedFd` instance.
#[cfg(not(any(target_arch = "wasm32", target_os = "hermit")))]
#[stable(feature = "io_safety", since = "1.63.0")]
pub fn try_clone_to_owned(&self) -> crate::io::Result<OwnedFd> {
// We want to atomically duplicate this file descriptor and set the
// CLOEXEC flag, and currently that's done via F_DUPFD_CLOEXEC. This
// is a POSIX flag that was added to Linux in 2.6.24.
#[cfg(not(target_os = "espidf"))]
let cmd = libc::F_DUPFD_CLOEXEC;
// For ESP-IDF, F_DUPFD is used instead, because the CLOEXEC semantics
// will never be supported, as this is a bare metal framework with
// no capabilities for multi-process execution. While F_DUPFD is also
// not supported yet, it might be (currently it returns ENOSYS).
#[cfg(target_os = "espidf")]
let cmd = libc::F_DUPFD;
// Avoid using file descriptors below 3 as they are used for stdio
let fd = cvt(unsafe { libc::fcntl(self.as_raw_fd(), cmd, 3) })?;
Ok(unsafe { OwnedFd::from_raw_fd(fd) })
}
/// Creates a new `OwnedFd` instance that shares the same underlying file
/// description as the existing `BorrowedFd` instance.
#[cfg(any(target_arch = "wasm32", target_os = "hermit"))]
#[stable(feature = "io_safety", since = "1.63.0")]
pub fn try_clone_to_owned(&self) -> crate::io::Result<OwnedFd> {
Err(crate::io::const_io_error!(
crate::io::ErrorKind::Unsupported,
"operation not supported on WASI yet",
))
}
} |
I think the main issue is that clone causes a "stack switch", which makes the child process lose all the local variables originally on the stack. So, at the moment, I don't have a good idea to solve this problem... (Apart from using Rc and manually adjusting the reference count.) @zahash |
i thought clone creates a process identical to the parent with all the local variables. also, the CloneCb closure is boxed and we send the actual function pointer of the closure to libc. so shouldn't the closure retain all the references to the enclosed variables in the function body. |
@anti-entropy123 i did this and all the tests pass. I basically removed @YJDoc2 i can't create a PR right now because this hasn't merged yet #2728 and a lot of my changes depend on it. sorry for using a patch again. diff --git a/crates/libcontainer/src/channel.rs b/crates/libcontainer/src/channel.rs
index 23d255f9..427912d6 100644
--- a/crates/libcontainer/src/channel.rs
+++ b/crates/libcontainer/src/channel.rs
@@ -18,13 +18,11 @@ pub enum ChannelError {
#[error("channel connection broken")]
BrokenChannel,
}
-#[derive(Clone)]
pub struct Receiver<T> {
receiver: RawFd,
phantom: PhantomData<T>,
}
-#[derive(Clone)]
pub struct Sender<T> {
sender: RawFd,
phantom: PhantomData<T>,
diff --git a/crates/libcontainer/src/process/channel.rs b/crates/libcontainer/src/process/channel.rs
index 9683e5ed..b1bd2e88 100644
--- a/crates/libcontainer/src/process/channel.rs
+++ b/crates/libcontainer/src/process/channel.rs
@@ -40,7 +40,6 @@ pub fn main_channel() -> Result<(MainSender, MainReceiver), ChannelError> {
Ok((MainSender { sender }, MainReceiver { receiver }))
}
-#[derive(Clone)]
pub struct MainSender {
sender: Sender<Message>,
}
@@ -88,7 +87,6 @@ impl MainSender {
}
}
-#[derive(Clone)]
pub struct MainReceiver {
receiver: Receiver<Message>,
}
@@ -199,7 +197,6 @@ pub fn intermediate_channel() -> Result<(IntermediateSender, IntermediateReceive
))
}
-#[derive(Clone)]
pub struct IntermediateSender {
sender: Sender<Message>,
}
@@ -219,7 +216,6 @@ impl IntermediateSender {
}
}
-#[derive(Clone)]
pub struct IntermediateReceiver {
receiver: Receiver<Message>,
}
@@ -256,7 +252,6 @@ pub fn init_channel() -> Result<(InitSender, InitReceiver), ChannelError> {
Ok((InitSender { sender }, InitReceiver { receiver }))
}
-#[derive(Clone)]
pub struct InitSender {
sender: Sender<Message>,
}
@@ -275,7 +270,6 @@ impl InitSender {
}
}
-#[derive(Clone)]
pub struct InitReceiver {
receiver: Receiver<Message>,
}
diff --git a/crates/libcontainer/src/process/container_intermediate_process.rs b/crates/libcontainer/src/process/container_intermediate_process.rs
index 0085416b..8d9a015c 100644
--- a/crates/libcontainer/src/process/container_intermediate_process.rs
+++ b/crates/libcontainer/src/process/container_intermediate_process.rs
@@ -99,12 +99,7 @@ pub fn container_intermediate_process(
}
let cb: CloneCb = {
- let args = args.clone();
- let init_sender = init_sender.clone();
- let inter_sender = inter_sender.clone();
- let mut main_sender = main_sender.clone();
- let mut init_receiver = init_receiver.clone();
- Box::new(move || {
+ Box::new(|| {
if let Err(ret) = prctl::set_name("youki:[2:INIT]") {
tracing::error!(?ret, "failed to set name for child process");
return ret;
@@ -121,7 +116,7 @@ pub fn container_intermediate_process(
tracing::error!(?err, "failed to close sender in the intermediate process");
return -1;
}
- match container_init_process(&args, &mut main_sender, &mut init_receiver) {
+ match container_init_process(&args, main_sender, init_receiver) {
Ok(_) => 0,
Err(e) => {
tracing::error!("failed to initialize container process: {e}");
diff --git a/crates/libcontainer/src/process/container_main_process.rs b/crates/libcontainer/src/process/container_main_process.rs
index f7066053..41fa96ab 100644
--- a/crates/libcontainer/src/process/container_main_process.rs
+++ b/crates/libcontainer/src/process/container_main_process.rs
@@ -42,16 +42,12 @@ pub fn container_main_process(container_args: &ContainerArgs) -> Result<(Pid, bo
// cloned process, we have to be deligent about closing any unused channel.
// At minimum, we have to close down any unused senders. The corresponding
// receivers will be cleaned up once the senders are closed down.
- let (main_sender, mut main_receiver) = channel::main_channel()?;
- let inter_chan = channel::intermediate_channel()?;
- let init_chan = channel::init_channel()?;
+ let (mut main_sender, mut main_receiver) = channel::main_channel()?;
+ let mut inter_chan = channel::intermediate_channel()?;
+ let mut init_chan = channel::init_channel()?;
let cb: CloneCb = {
- let container_args = container_args.clone();
- let mut main_sender = main_sender.clone();
- let mut inter_chan = inter_chan.clone();
- let mut init_chan = init_chan.clone();
- Box::new(move || {
+ Box::new(|| {
if let Err(ret) = prctl::set_name("youki:[1:INTER]") {
tracing::error!(?ret, "failed to set name for child process");
return ret;
diff --git a/crates/libcontainer/src/process/fork.rs b/crates/libcontainer/src/process/fork.rs
index c0e05539..2a69d8d2 100644
--- a/crates/libcontainer/src/process/fork.rs
+++ b/crates/libcontainer/src/process/fork.rs
@@ -33,7 +33,7 @@ pub enum CloneError {
/// pass in a child stack, which is empty. By storing the closure in heap, we
/// can then in the new process to re-box the heap memory back to a closure
/// correctly.
-pub type CloneCb = Box<dyn FnMut() -> i32>;
+pub type CloneCb<'a> = Box<dyn FnMut() -> i32 + 'a>;
// Clone a sibling process that shares the same parent as the calling
// process. This is used to launch the container init process so the parent
|
the lifetime in let pid = fork::container_clone_sibling(cb).map_err(|err| {
tracing::error!("failed to fork init process: {}", err);
IntermediateProcessError::InitProcess(err)
})?;
// ...
main_sender.intermediate_ready(pid).map_err(|err| {
tracing::error!("failed to wait on intermediate process: {}", err);
err
})?; without this, the child might outlive the parent and become zombie. and rust lifetimes won't save us. maybe there is a better way to do this so that rust understands the "waiting" part is important for the lifetime. |
OK, I agree with what you're saying, and since your code doesn't clone the However, I'd still like to ask if you're aware of whether you're actually calling // An internal wrapper to manage the clone3 vs clone fallback logic.
fn clone_internal(
mut cb: CloneCb,
flags: u64,
exit_signal: Option<u64>,
) -> Result<Pid, CloneError> {
match clone3(&mut cb, flags, exit_signal) {
...
// For now, we decide to only fallback on ENOSYS
Err(CloneError::Clone(nix::Error::ENOSYS)) => {
...
let pid = clone(cb, flags, exit_signal)?; Since in clone(), a new stack is allocated for the child process, fn clone(cb: CloneCb, flags: u64, exit_signal: Option<u64>) -> Result<Pid, CloneError> {
...
let ret = unsafe {
libc::clone(
main,
child_stack_top,
combined_flags,
data as *mut libc::c_void,
)
};
... wouldn't the closure's references to local variables (e.g., &container_args) become invalid? @zahash |
I have the exact same question 😅😅 do you know how clone3 vs clone differences are? and since the closure has references to variables in the paren'ts memory, do they also get copied to the child memory? i can see that clone3 also takes but why do the tests pass? do they even check this part of the code? |
I think your test device should support clone3(), so it likely called clone3() in reality. And clone3() seems to completely copy the parent process's memory, which is why your test passed. |
@anti-entropy123 is there a way to force it to use clone instead? never mind. i'll just comment out the clone3 part |
But I'm not entirely sure why clone() necessitates switching to a new stack; it could be a requirement of the syscall itself (? 🤔 ). |
@anti-entropy123 all the tests still pass fn clone_internal(
mut cb: CloneCb,
flags: u64,
exit_signal: Option<u64>,
) -> Result<Pid, CloneError> {
clone(cb, flags, exit_signal)
} |
Amazing... I'm curious about the reason. @zahash |
I'm equally curious. 😂 maybe this is undefined behavior. |
My expectation was that a segmentation fault would occur. |
@anti-entropy123 if found this on the kernel man page
|
I think I understand now. Even though clone() assigns a new stack to the child process, the original memory region (the old stack) may still be valid. So, those references remain effective. |
Hey, will need some time to read all the discussion above and consider it. Probably would get back on this around the weekend.... |
maybe. but how can one process directly access the memory of another process? its not allowed right? or maybe its allowed in "parent - child" processes? or maybe its because, the child gets an exact copy of the parent's memory when clone/fork |
No, what I mean is that the old stack is also copied to the child process. The address spaces of the parent and child processes are also isolated. I'm planning to observe the maps of both processes. @zahash |
yes please. that would be very helpful 🥰 |
@zahash It seems our guess was correct.
Both the memory region containing the local variables of the child process (0x7f71176678c4) and the region referencing the local variables of the parent process (located at 7ffccdd65000-7ffccdd9b000) appear to be valid. So now it all makes sense. 🎉 |
No, you can see our test coverage here: https://app.codecov.io/gh/containers/youki |
@anti-entropy123 @zahash |
@anti-entropy123 , @zahash if you are still interested in this, the nix update to 0.28 is done, so this can be worked on. |
@YJDoc2 yes I will start working on it, thanks for notifying me |
i bought a windows laptop and using wsl to work on this project. failures:
---- systemd::dbus_native::dbus::tests::test_dbus_function_calls stdout ----
Error: DBus(ConnectionError("ENOENT: No such file or directory"))
---- systemd::manager::tests::test_task_addition stdout ----
thread 'systemd::manager::tests::test_task_addition' panicked at crates/libcgroups/src/systemd/manager.rs:553:10:
called `Result::unwrap()` on an `Err` value: SystemdClient(DBus(BusAddressError("no valid bus path found in list unix:path=/run/user/1000/bus")))
---- systemd::dbus_native::dbus::tests::test_dbus_function_calls_errors stdout ----
thread 'systemd::dbus_native::dbus::tests::test_dbus_function_calls_errors' panicked at crates/libcgroups/src/systemd/dbus_native/dbus.rs:539:69:
called `Result::unwrap()` on an `Err` value: DBus(ConnectionError("ENOENT: No such file or directory"))
---- systemd::dbus_native::dbus::tests::test_dbus_connection_auth stdout ----
thread 'systemd::dbus_native::dbus::tests::test_dbus_connection_auth' panicked at crates/libcgroups/src/systemd/dbus_native/dbus.rs:486:9:
assertion failed: conn.is_ok()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
systemd::dbus_native::dbus::tests::test_dbus_connection_auth
systemd::dbus_native::dbus::tests::test_dbus_function_calls
systemd::dbus_native::dbus::tests::test_dbus_function_calls_errors
systemd::manager::tests::test_task_addition
test result: FAILED. 139 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.11s
error: test failed, to rerun pass `-p libcgroups --lib` |
here is the initial PR #2809 |
Hey @zahash
I have never used wsl, so not 100% sure, but from error it seems like the dbus is not setup properly. You might check if wsl runs systemd+dbus in the first place, which is needed for these tests. In case it doesn't, it should be fine as long as you are not changing any code related to the dbus impl. |
ref: issue #2317, PR #2369
Introducation
Since nix version 0.27.1, some APIs (e.g.,
nix::pty::openpty()
,nix::sys::socket::socket()
) have started usingOwnedFd
to manage FDs. However, it is not straightforward to passOwnedFd
between forks. So in PR #2369, we still useRawFd
in the subsequent steps.This issue try to do the change from
RawFd
toOwnedFd
.The text was updated successfully, but these errors were encountered: