Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Console path resolution is done in host mount namespace #814

Closed
cyphar opened this issue May 11, 2016 · 16 comments · Fixed by #1018
Closed

Console path resolution is done in host mount namespace #814

cyphar opened this issue May 11, 2016 · 16 comments · Fixed by #1018
Labels
Milestone

Comments

@cyphar
Copy link
Member

cyphar commented May 11, 2016

[I realised this while trying to get the test suite to run for #774.]

The main issue is that we set up the console in the parent's mount namespace. This breaks quite a few things. In addition, --console resolution done in the parent causes things like su to not work in containers (because glibc is broken). If you have a config.json like this:

{
    "ociVersion": "0.6.0-dev",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "process": {
        "terminal": true,
        "user": {},
        "args": [
            "sh"
        ],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "cwd": "/",
        "capabilities": [
            "CAP_AUDIT_WRITE",
            "CAP_KILL",
            "CAP_NET_BIND_SERVICE"
        ],
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ],
        "noNewPrivileges": true
    },
    "root": {
        "path": "rootfs",
        "readonly": true
    },
    "hostname": "runc",
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "tmpfs",
            "source": "shm",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "mode=1777",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "none",
            "source": "/sys",
            "options": [
                "rbind",
                "nosuid",
                "noexec",
                "nodev",
                "ro"
            ]
        }
    ],
    "hooks": {},
    "linux": {
        "uidMappings": [
            {
                "hostID": 1000,
                "containerID": 0,
                "size": 1
            }
        ],
        "gidMappings": [
            {
                "hostID": 100,
                "containerID": 0,
                "size": 1
            }
        ],
        "namespaces": [
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            },
            {
                "type": "user"
            }
        ],
        "maskedPaths": [
            "/proc/kcore",
            "/proc/latency_stats",
            "/proc/timer_stats",
            "/proc/sched_debug"
        ],
        "readonlyPaths": [
            "/proc/asound",
            "/proc/bus",
            "/proc/fs",
            "/proc/irq",
            "/proc/sys",
            "/proc/sysrq-trigger"
        ]
    }
}

/cc @crosbymichael

@cyphar cyphar mentioned this issue May 11, 2016
46 tasks
@cyphar
Copy link
Member Author

cyphar commented May 11, 2016

This is probably a good reason for us to start adding integration tests for user namespaces.

@cyphar
Copy link
Member Author

cyphar commented May 11, 2016

This happens inside (*linuxConsole).dupStdio(), because /dev/pts/ptmx is being opened there. And since it's being opened in the host, we get screwed by the mode of /dev/pts/ptmx in the host's devpts instance -- we should fix this. According to the docs, we should probably be using /dev/pts/ptmx inside the container. However, this means that we'll have to mess around with things.

@cyphar cyphar added this to the 0.2.0 milestone May 13, 2016
@crosbymichael
Copy link
Member

The tricky part is getting the pty master back outside of the container after opening it.

@cyphar
Copy link
Member Author

cyphar commented May 29, 2016

This also links to several old issues with libcontainer with regard to sudo not working in containers -- it all links back to the fact that we are creating the terminal in the host. I'm trying to figure out a way to make this unnecessary.

@cyphar
Copy link
Member Author

cyphar commented Jun 1, 2016

Maybe we could use unix sockets to send the fd? I'll take another look at this next week.

@crosbymichael
Copy link
Member

i don't like unix socket sending fds. i really want to avoid these super containers with live connections going in and out.

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

I have two ideas currently:

  1. We use a unix socket, which requires setting up a unix socket just during setup. This is going to be messy because ancillary data requires a lot of C finesse. But it will work.
  2. We could try forking a process that shares the set of files (~CLONE_FILES) which then uses setns(2) to join the mount (and user if available) namespace of the container to then open either /proc/1/fd/0 or /dev/pts/<number>. The issue with this is that it might not work properly (I'm trying to remember what the minimum set of clone flags are for namespaces -- I'm fairly sure you need CLONE_FLAGS). We also need to add another nsenter-like reexec thing.

@wking
Copy link
Contributor

wking commented Jun 2, 2016

On Wed, Jun 01, 2016 at 07:37:47PM -0700, Aleksa Sarai wrote:

  1. We use a unix socket, which requires setting up a unix socket
    just during setup. This is going to be messy because ancillary data
    requires a lot of C finesse. But it will work.

It's not that difficult; ~50 lines for sendfd and recvfd wrappers in
ccon, based on an example in cmsg(3).

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

I said messy and finesse. It's not that it's hard, it's that it'll be a bit of C code we can't touch.

@wking
Copy link
Contributor

wking commented Jun 2, 2016

On Wed, Jun 01, 2016 at 07:47:35PM -0700, Aleksa Sarai wrote:

It's not that it's hard, it's that it'll be a bit of C code we can't
touch.

Why “can't touch”? I'm not even sure what that means :p.

@wking
Copy link
Contributor

wking commented Jun 2, 2016

On Wed, Jun 01, 2016 at 04:41:47PM -0700, Michael Crosby wrote:

i really want to avoid these super containers with live connections
going in and out.

Ccon uses an anonymous Unix socket (socketpair(2)) 1 the way runC
apparently uses a pipe 2. The host ↔ container socket gets closed
before the container process calls execve(2) (or similar) to launch
the user code. So there shouldn't be any more “live connections” than
you already have.

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

I'm going to be honest, I didn't know about socketpair(2). This would make the code much simpler. As for "can't touch" it just means that we'll have to mix some Go and C code that has odd semantics and we have to copy from the man page for cmsg and touching it would probably cause something to blow up.

@wking
Copy link
Contributor

wking commented Jun 2, 2016

On Wed, Jun 01, 2016 at 09:11:02PM -0700, Aleksa Sarai wrote:

… touching it would probably cause something to blow up.

That's true of most things, and what test suites are for ;).

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

If @crosbymichael is okay with using socketpair(2) in place of pipe(2) for the parent <-> child communication, I'd be fine with implementing the code this way. But there's a question of whether or not we'll get similar issues as with the whole ECONNRESET fiasco.

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

Wait hang on. We already use socketpair -- look at newPipe() at libcontainer/container_linux.go! So this should be a fairly simple extension.

@crosbymichael crosbymichael removed this from the 1.0.0 milestone Jun 3, 2016
@cyphar cyphar changed the title --console doesn't work with user namespaces Console path resolution is done in host mount namespace Jun 4, 2016
@cyphar cyphar added this to the 1.1.0 milestone Jun 6, 2016
@cyphar cyphar mentioned this issue Aug 15, 2016
2 tasks
@cyphar
Copy link
Member Author

cyphar commented Aug 17, 2016

Currently working on the design plan. https://gist.github.com/cyphar/8c6b9db84fc1f2cc2d037ef07942ca83

Here's @crosbymichael's mockup of how you could implement things in a simple C program. https://gist.github.com/crosbymichael/d3045070f0e2615814aaa31e8991d7fd

stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017
stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017
Through f4d221c (Merge pull request opencontainers#880 from
dqminh/wking-linux-only-capabilities-again, 2017-07-05).  The rc6
release picked up an earlier version of these notes, and those entries
are mostly unchanged except for:

* The credentialSpec entry, which was opencontainers#814 for credentialspec and now
  also includes opencontainers#859 for credentialSpec.

* The root(.path) Hyper-V entry, which was opencontainers#820 for root.path and now
  also includes opencontainers#838 for root.  I also moved this into the "breaking
  changes" section, because rc5 Hyper-V configs required root to be
  set, and rc6 Hyper-V configs require it to not be set.  Although
  whether rc5 allowed Hyper-V configs at all is not clear to me.

* Fixed indenting for the typo-fixes entry, as well as a number of
  more recent typo-fix PRs.

Signed-off-by: W. Trevor King <wking@tremily.us>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants