Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add Support for Docker Desktop and Rootless Docker on Linux #1083

Closed
rmartin16 opened this issue Feb 5, 2023 · 16 comments · Fixed by #1331
Closed

Add Support for Docker Desktop and Rootless Docker on Linux #1083

rmartin16 opened this issue Feb 5, 2023 · 16 comments · Fixed by #1331
Labels
enhancement New features, or improvements to existing features. linux The issue relates Linux support.

Comments

@rmartin16
Copy link
Member

rmartin16 commented Feb 5, 2023

What is the problem or limitation you are having?

Using Briefcase to create AppImages does not support Docker Desktop or rootless Docker; see #1082 and #1095.

Describe the solution you'd like

Support for Docker Desktop and rootless Docker.

On Linux, Docker Desktop complicates bind mounts because it runs containers inside a VM. Therefore, a bind mount is first translated through the VM and then in to the container. This results in existing files and directories being owned by root; files and directories created or chowned inside the container result in ownership by non-existing users/groups.

Docker provides these recommendations for managing this.

I do not have a native macOS environment and cannot confirm if Docker Desktop on macOS suffers the same fate.

Potential solution outlined below. tldr: don't create brutus user in image build since root is mapped to host user.

Describe alternatives you've considered

Implementing #1082 at least prevents unsuspecting users from errors that do not suggest at all their Docker platform is unsupported. Given Docker Desktop support is far from trivial (or really possible?), that may be considered sufficient.

Additional context

No response

@rmartin16 rmartin16 added enhancement New features, or improvements to existing features. linux The issue relates Linux support. labels Feb 5, 2023
@rmartin16 rmartin16 changed the title Add for Support Docker Desktop Add Support for Docker Desktop Feb 5, 2023
@freakboy3742
Copy link
Member

FWIW, Docker Desktop is the only option (AFAIK) for running Docker on macOS, and that's where I've done all my testing.

Agreed that #1082 is probably the best short term option; I'm not sure I have any great ideas on a permanent fix. The only thought I've had is that the /app mount is entirely arbitrary; if there's somewhere "safe" that we can put content that is be owned by brutus (e.g., /home/brutus/app, then that might avoid the permissions problems? I'm guessing the /etc/setuid and /etc/setgid pieces may also be needed.

@rmartin16
Copy link
Member Author

rmartin16 commented Feb 12, 2023

As detailed in #1095, both Docker Desktop and rootless Docker use Linux namespaces to isolate the container. So, while the default user inside the container is root, its default privileges for anything mounted in to the container remain at the level of the invoking user.

For our use-case, we're most interested in filesystem interaction (although, network access could be important as well). For bind mounts, the root user in the container is mapped to the user that started the docker container. Therefore, the work done in Dockerfile to create the less privileged user brutus is duplicative of what these other Docker modes are doing.

Removing the brutus-specific tasks in Dockerfile allowed the AppImage build to run normally for rootless Docker and Docker Desktop on Linux:

diff --git a/{{ cookiecutter.formal_name }}/Dockerfile b/{{ cookiecutter.formal_name }}/Dockerfile
index 512027c..8000d8d 100644
--- a/{{ cookiecutter.formal_name }}/Dockerfile 
+++ b/{{ cookiecutter.formal_name }}/Dockerfile 
@@ -27,28 +27,14 @@ RUN apt-get update -y && \
         python${PY_VERSION}-dev \
         python${PY_VERSION}-venv
 
-# Ensure Docker user UID:GID matches host user UID:GID (beeware/briefcase#403)
-# Use --non-unique to avoid problems when the UID:GID of the host user
-# collides with entries provided by the Docker container.
-ARG HOST_UID
-ARG HOST_GID
-RUN groupadd --non-unique --gid $HOST_GID briefcase && \
-    useradd --non-unique --uid $HOST_UID --gid $HOST_GID brutus --home /home/brutus && \
-    mkdir -p /home/brutus && chown brutus:briefcase /home/brutus
-
 # Ensure pip is available; do this as the brutus user
 # to ensure that the pip cache is created user-readable.
-USER brutus
 RUN python${PY_VERSION} -m ensurepip
 
 # As root, Install system packages required by app
-USER root
 ARG SYSTEM_REQUIRES
 RUN apt-get update -y && \
     apt-get install --no-install-recommends -y ${SYSTEM_REQUIRES}
 
-# Use the brutus user for operations in the container
-USER brutus
-
 # ========== START USER PROVIDED CONTENT ==========
 {{ cookiecutter.dockerfile_extra_content }}

@rmartin16 rmartin16 changed the title Add Support for Docker Desktop Add Support for Docker Desktop and Rootless Docker on Linux Feb 12, 2023
@freakboy3742
Copy link
Member

If removing those sections is a potential fix, we could put those parts as "template optional" - i.e., add a rootless=True argument to the template, and only include the brutus user parts if rootless==False. All that would be left then is identifying when we are in a "rootless" situation.

In the meantime - do we need to add to the documentation to clarify exactly what type of Docker install is valid (especially on the BeeWare tutorial, which is the first point of contact people have with this problem?)

@rmartin16
Copy link
Member Author

rmartin16 commented Jun 12, 2023

After figuring out that Python True/False doesn't really flow that well in to the jinja template through cookiecutter (or at least not as far as I could figure out), I got the Linux System template to reliably skip the creation of the brutus user.

The next problem is detecting when to build the image with and without brutus. This is especially true for rootless Docker on Linux....because it seems to be using the same Docker Engine....just without root. So, things are moved around a bit and user namespace mapping is doing a lot of heavy lifting...but everything else looks mostly the same as rootful Docker. I do see something called rootlesskit when running in rootless mode....but things like that feel like implementation details subject to change and potentially really finicky to drive functionality.

Given this, I'm also considering a different approach altogether. Basically, launch a container (like alpine or something small) with a bind-mount in to the project and touch a file inside the bind mount. If the file on the host system is owned by root, then we'd know we need to create the brutus user....however, if the file is owned by the current user, then we'd skip brutus.

I suppose another approach altogether is actually understanding and taking advantage of the user namespace mapping functionality. My current understanding is a whole range of user IDs used inside the container are mapped to the host user ID running the container.....so, maybe this is just as easy as reading the /etc/subuid file and using one of those IDs.

@freakboy3742
Copy link
Member

I suppose another approach altogether is actually understanding and taking advantage of the user namespace mapping functionality. My current understanding is a whole range of user IDs used inside the container are mapped to the host user ID running the container.....so, maybe this is just as easy as reading the /etc/subuid file and using one of those IDs.

The context for that part:

On macOS, you can mount any external volume you want into your Docker container. However, when you create a new file inside the container, the GID and UID of the file is set to the GID and UID of the user inside the docker container. By default, the Docker user is UID/GID 0 which means you end up writing files on your user's filesystem that are owned by the macOS root user outside the container. The Brutus user is created with the same GID and UID as the user outside the container, so the files that are created are readable by the user.

I don't recall if the same behavior happens on Linux.

FWIW - this is 100% a hack, and it also seems like a delightful opportunity for a security hole on the part of Docker - but I'm not aware of any options we can pass to Docker that make it not do this weird UID/GID stuff.

@rmartin16
Copy link
Member Author

the Docker user is UID/GID 0 which means you end up writing files on your user's filesystem that are owned by the macOS root user outside the container

We tested this on latest Docker Desktop on macOS and anything the container root user writes to the bind mount ends up owned by the user running Docker Desktop.

Maybe this changed with osxfs....or VirtioFS....I'm not sure:

Choose file sharing implementation for your containers. Choose whether you want to share files using VirtioFS, gRPC FUSE, or osxfs. The VirtioFS option is only available for macOS versions 12.5 and above. [source]

Nonetheless, Docker Desktop does not behave like this on Linux. If root writes to a bind mount on Linux, the file is owned by root on the host.

At any rate, I also posted about this on the Docker forums. I was told I could copy the files in to a volume mount each time if I wanted....but this doesn't seem ideal. Alternatively, the current conditional non-root user option was confirmed as an option. Another option was to chown the files inside the container each time....I'll have to think about this one more.

@rmartin16
Copy link
Member Author

rmartin16 commented Jun 22, 2023

Interesting situation....when a deb is installed and run in a Docker Desktop container, libgtk-3 seg faults when the app starts. The same deb does install and runs fine in a VM....so I'm not gonna go down this rabbit hole right now....but FYI in case this comes up in the future.

[edit]
Thinking more about it, I think this more likely stems from how I'm passing x11 access to the container. Since Docker Desktop uses a qemu VM to run the container, access to the host x11 server probably doesn't work leading to the crash.

@rmartin16
Copy link
Member Author

I've looked through so much documentation and articles about this and there doesn't appear to be a reliable way to share a directory between a host and a container while supporting write permissions for either irrespective of how Docker is configured.

Most of the recommended configurations require changes to the host environment; for example, Docker's own suggestion if you try to leverage the user ID mapping:

In this scenario if a shared file is chowned inside a Docker Desktop container owned by a user with a UID of 1000, it shows up on the host as owned by a user with a UID of 100999. This has the unfortunate side effect of preventing easy access to such a file on the host. The problem is resolved by creating a group with the new GID and adding our user to it, or by setting a recursive ACL (see setfacl(1)) for folders shared with the Docker Desktop VM. [source]

I'm mostly down to two options:

  1. Limit the file system writing to only the container
  • I'm not even 100% sure this would always be possible without files owned by a UID other than the host user....but it would simplify the story about who needs write permissions....
  • However, this would require re-organizing how the template is rolled out....and creates a chicken-and-egg problem since we depend on the rolled out template to create the container to begin with
  1. Conditionally use a step down user in the container
  • This works; when I manually switch the Docker and use the step down user when necessary, everything works.
  • However, there is not a straight forward way to detect when to step down and when to not.
  • There are simply too many variables at play and a lot of them seem to be implementation details.
  • While normal Docker installs will typically look similar:
    • There's no guarantee that will remain true in to the future....or that the install patterns won't change
    • A user could simply set up Docker however they want that looks different but is functionally the same
  • Therefore, I'm left in the position to simply:
    • create a container using whatever Docker is invoked with the docker command
    • bind mount the Briefcase project directory in to the container
    • write a file to that directory inside the container
    • If, on the host system, that file is owned by root, use a step down user; if not, use root in the Dockerfile
    • Delete the file from inside the container
  • Note: since this has to happen before we actually use the Dockerfile...it'll have to use a random image like alpine to run this test; alternatively, since Briefcase will eventually need to pull the base image for the Dockerfile, we could just go ahead and pull it for this task. That would avoid pulling a random image just for this....even if it is the 8MB alpine image.

@freakboy3742
Copy link
Member

I'm mostly down to two options:

  1. Limit the file system writing to only the container

I'm not sure I see how this would even work in practice - how do you exfiltrate the build artefacts?

  • create a container using whatever Docker is invoked with the docker command

This approach seems completely acceptable to me.

  • Note: since this has to happen before we actually use the Dockerfile...it'll have to use a random image like alpine to run this test; alternatively, since Briefcase will eventually need to pull the base image for the Dockerfile, we could just go ahead and pull it for this task. That would avoid pulling a random image just for this....even if it is the 8MB alpine image.

I'd agree that using the "actual" image is preferable. The prepare() call on the base Docker tool already does a no-op run to ensure the image is available; changing this to a not-so-no-op "write file and check permissions", and setting a flag that is used later on bound containers seems entirely reasonable.

@rmartin16
Copy link
Member Author

I'd agree that using the "actual" image is preferable. The prepare() call on the base Docker tool already does a no-op run to ensure the image is available; changing this to a not-so-no-op "write file and check permissions", and setting a flag that is used later on bound containers seems entirely reasonable.

Yeah...that prepare() call may be tricky, though; its only used right now for Linux System where the effective base image of the Dockerfile is explicitly provided at runtime. In the case of AppImage, the base image can be inferred from the manylinux settings....but in the most general case of an arbitrary template with a Dockerfile, deriving the base image may not actually be possible. I'm going to try to work out the details and dependencies today.

@mhsmith
Copy link
Member

mhsmith commented Jun 24, 2023

  1. Limit the file system writing to only the container

I'm not sure I see how this would even work in practice - how do you exfiltrate the build artefacts?

I haven't been following this discussion, so this might not be helpful, but there is a docker cp command.

@rmartin16
Copy link
Member Author

  1. Limit the file system writing to only the container

I'm not sure I see how this would even work in practice - how do you exfiltrate the build artefacts?

I haven't been following this discussion, so this might not be helpful, but there is a docker cp command.

Right. This also brings up more generalized thoughts I've had about this problem....fwiw

While we're delegating to Docker to provide the environment to install prerequisite packages, build stub binaries, and create redistributable artifacts, we are not letting Docker manage the intermediate (yet persistent) data that we're storing in the host file system. This creates the crux of our problem as Docker must cross a boundary from what it controls in to what it does not.

volume mount

Instead, we could allow Docker to manage this persistent (yet ultimately ephemeral in the larger sense) data via a volume mount. So, instead of bind mounting part of the build directory in to the container, a dedicated volume would be mounted and this volume would house the build file system.

In this way, Briefcase would then need to manually manage the boundary between Docker and the host; this would namely require managing the updating of the user's source in to the volume mount as well as extracting out those redistributable artifacts.

I think that would be fairly straightforward....however, we would also need to make all the assumptions Briefcase makes about the build file system (i.e. the build directory) abstract...such that all the file system checks it performs (e.g. the build command checking if the bundle_path exists so it can run the create command first if necessary) would need to support looking inside a Docker volume mount. This also obviously has the consequence of obscuring all this intermediate state from the user inside of Docker.

(this also may help with support for Docker on windows)

dockerfile only

Another thought I had was to exclusively use the Dockerfile to run Docker commands. In practice, I think this would be a multi-target Dockerfile where each target would roughly represent a Briefcase command. So, a create target would roll out the template, and update target could install pip requirements and source code, a build command builds it, etc.

This would, however, have the effect of basically recreating Briefcase inside a Dockerfile. As such, this doesn't seem feasible at this point. But maybe if Briefcase was originally oriented around using Docker for everything, this type of approach might make more sense.

conclusion

While it's interesting to imagine a more "docker-native" implementation, these approaches don't seem particularly feasible at this point. Or even some intermediate state of them.

For instance, using a volume mount in conjunction with the host file system build directory. This would effectively require keeping these two storage locations in sync with each other; this could use docker cp and thus avoid all these permission issues since Docker would explicitly manage crossing this boundary. OTOH, this seems incredibly inefficient and depending on keeping anything this complex in sync is bound to be a source of issues.

Another intermediate state is perhaps always using root in the container but executing some kind of chown at the end of any command to ensure the files are owned by the user on host file system. My primary concern with this, though, is inefficiency with having to recursively chown the files all the time or an early exception leading to remaining files owned by root.

So, continuing to use the bind mount like this is probably the best option....while just trying to accommodate Docker's varying behaviors...

@freakboy3742
Copy link
Member

volume mount

I guess I can see how this would work; but if I'm understanding you correctly, there's 2 notable costs:

  1. We lose the build directory as a directly inspectable source of artefacts.
  2. There's a lot more copying required - essentially the entire pre-build contents of the build directory needs to be copied into the volume, and then the individual files of interest exfiltrated afterwards.

(this also may help with support for Docker on windows)

That's an interesting (and not insignificant) point; that said, I think the existing issues with Windows support are resolvable, just fiddly because of the path interpolation that needs to happen (and, ironically, are mostly a problem because of testing, not because of the implementation).

dockerfile only

To me, this is "rewrite Briefcase, but in dockerfiles". Yes, it's probably possible, but... ugh :-)

conclusion

...

For instance, using a volume mount in conjunction with the host file system build directory.

Agreed this is possible, but not especially attractive. It resolves issue (1) from above, but at the cost of making issue (2) worse.

So, continuing to use the bind mount like this is probably the best option....while just trying to accommodate Docker's varying behaviors...

I think I agree. While these bugs are annoying, we're not that far from having a working solution; and while I'm always down for a massive rework/refactoring if it turns out a core implementation is "wrong" in some way, in this case I think it's more a case of there being some unaccounted for edge cases, rather than anything fundamentally wrong.

@rmartin16
Copy link
Member Author

rmartin16 commented Jul 1, 2023

WSL 2 with Docker Desktop

Out of curiosity, I tested running Briefcase in WSL 2 with Docker Desktop running with WSL 2 integration enabled for the WSL distro I was using. To my surprise, briefcase run --target ubuntu worked...

It turns out that Docker Desktop with WSL 2 integration does not support user namespace mapping....despite Docker using a Linux VM in WSL to run the containers. (This is apparently because Docker Desktop and the WSL distro are already isolated from each other....by running in their own user namespaces.) Therefore, the files created inside the container in the bind mount directory are owned by root in the WSL distro.

So....this further reinforces the strategy of "perform a bind mount write and see who owns the file" is the one to continue moving forward with....there are just too many configurations without reliable ways to probe the environment to determine what to do.

At any rate....this does mean the Linux commands should already all be possible on Windows with WSL 2.

[edit] This even includes briefcase run linux.....I guess Windows is running an x11 server now 😮

@freakboy3742
Copy link
Member

At any rate....this does mean the Linux commands should already all be possible on Windows with WSL 2.

Interesting... what does sys.platform return under WSL? Is there any potential for platform confusion there?

@rmartin16
Copy link
Member Author

At any rate....this does mean the Linux commands should already all be possible on Windows with WSL 2.

Interesting... what does sys.platform return under WSL? Is there any potential for platform confusion there?

>>> import platform
>>> platform.system()
'Linux'
>>> platform.uname()
uname_result(system='Linux', node='vm-win11', release='5.15.90.1-microsoft-standard-WSL2', version='#1 SMP Fri Jan 27 02:56:13 UTC 2023', machine='x86_64')

log file header:

Date/Time:       2023-07-01 19:36:08
Command line:    /home/user/tmp/venv-3.10/bin/briefcase create linux --log

OS Release:      Linux 5.15.90.1-microsoft-standard-WSL2
OS Version:      #1 SMP Fri Jan 27 02:56:13 UTC 2023
Architecture:    x86_64
Platform:        Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35

Python exe:      /home/user/tmp/venv-3.10/bin/python
Python version:  3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
Virtual env:     True
Conda env:       False

Briefcase:       0.3.15.dev382+gbeda029c
Target platform: linux
Target format:   system

I wouldn't expect anything about Windows to leak through. My understanding is WSL is effectively running as a VM in Windows' hyper visor; so, the Windows environment should be isolated.

It even built and ran an AppImage.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New features, or improvements to existing features. linux The issue relates Linux support.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants