-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
rootfs: umount all procfs and sysfs with --no-pivot #1962
rootfs: umount all procfs and sysfs with --no-pivot #1962
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (IANAM)
libcontainer/rootfs_linux.go
Outdated
return err | ||
} | ||
if err := unix.Unmount(p, unix.MNT_DETACH); err != nil { | ||
if err.(syscall.Errno) != unix.EINVAL { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially we can also get EPERM
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, but I never saw that error message, when trying to umount /proc
or /sys
I got only EINVAL
. I can amend the patch if you'd like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went forward and amended the change in the updated version
5d8596f
to
c18aa18
Compare
When creating a new user namespace, the kernel doesn't allow to mount a new procfs or sysfs file system if there is not already one instance fully visible in the current mount namespace. When using --no-pivot we were effectively inhibiting this protection from the kernel, as /proc and /sys from the host are still present in the container mount namespace. A container without full access to /proc could then create a new user namespace, and from there able to mount a fully visible /proc, bypassing the limitations in the container. A simple reproducer for this issue is: unshare -mrfp sh -c "mount -t proc none /proc && echo c > /proc/sysrq-trigger" Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
c18aa18
to
28a697c
Compare
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Including critical security fix for `runc run --no-pivot` (unlikely to affect BuildKit): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> (cherry picked from commit 3aec9e7)
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> (cherry picked from commit 1ee33f4)
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> (cherry picked from commit 3aec9e7) Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Changes: opencontainers/runc@69663f0...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
return err | ||
} | ||
|
||
absRootfs, err := filepath.Abs(rootfs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS this is not needed since rootfs is already validated by (*ConfigValidator).rootfs()
} | ||
|
||
for _, info := range mountinfos { | ||
p, err := filepath.Abs(info.Mountpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating a new user namespace, the kernel doesn't allow to mount
a new procfs or sysfs file system if there is not already one instance
fully visible in the current mount namespace.
When using --no-pivot we were effectively inhibiting this protection
from the kernel, as /proc and /sys from the host are still present in
the container mount namespace.
A container without full access to /proc could then create a new user
namespace, and from there able to mount a fully visible /proc, bypassing
the limitations in the container.
A simple reproducer for this issue is:
unshare -mrfp sh -c "mount -t proc none /proc && echo c > /proc/sysrq-trigger"
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com