-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Full Docker support #195
base: master
Are you sure you want to change the base?
Full Docker support #195
Conversation
…ectory, pin alpine version, usage examples, use grab-site user and group for files and execution
@acrois THANK YOU for your work on this 👏👏👏 |
Since I am somewhat occupied at the moment: would a grab-site user kindly test this and all of the new README instructions? Feel free to post feedback or code review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Could you please clarify the build step so that I can continue testing?
…n on host port forwarding
In order to simplify the data storage and improve portability, I have made several improvements to the design of the Docker environment and taking advantage of some default behaviors, can completely remove the need to specify the "--dir" flag at runtime by default. I was working on a side project, related to grab-site and I discovered a pain when I was trying to automate running it in a cluster. I ended up isolating all grab-site related processes to their own network, for increased security. grab-site stores data in whatever current working directory you are in, so I simply added grab-site's install dir to the PATH so that references to scripts need "./" to refer to them. This allows us to simplify specifying one argument for persistence Quick start usage as well as changes regarding this change can be found in the updated README. Let me know if this is reproducible for you. I've run it on both Windows and Debian Docker hosts. I still plan on updating more information on what each parameter in the |
Sorry for the delay, I had a lot of things going on. I followed the updated instructions (though I substituted
Did I do something wrong? Do I need to make some directory exist first or create some volume? In general, if we are to support Docker, I would really like to have instructions that would work for a first-time Docker user, with everything made explicit. Otherwise people will generally just give up. |
Ah, I see
and then |
When I start a
We probably need instructions for "Ensure that [...] container has write privileges on volume" as well if we are to support this.
|
regarding the permissions, I am told "Fusl's Dockerfile simply ran grab-site as root inside the container, by the way. Not ideal, but since it's isolated to that container, probably fine? I'm not a Docker expert though. The Dockerfile in the PR creates a user and group inside the container, which is cleaner, but then that uid/gid needs to have write access in the mountpoint as well." |
to my knowledge there's basically 2 approaches for this.
you can also combine these as for example https://www.linuxserver.io/ does, the init script checks if it's running as root and it has env vars for dropping privilege, then it chowns the working directory and drops to the final user. effectively there's also a third option that can be used regardless of the previous configuration, however, depending on which of the previous configurations are chosen this can be tricky to implement. if you however specify VOLUMEs in the Dockerfile they are chowned (except for bind mounts) on creation to the user that owned the base folder at that time. with bind mounts you also won't get the files and folders initialized inside the volume if there are any, this only works with volumes. also to note is that some applications may run into issues if they run as a uid that does not map to a user (i.e. no matching entry in |
Thank you for the review, I have been a little busy so I just wanted to let you know I'm working on addressing the comments soon. I'll tag again when I've got some answers. |
One more comment: the paths in |
Save space on build by specifying `--no-cache-dir` to `pip install` in `Dockerfile` Co-authored-by: JustAnotherArchivist <JustAnotherArchivist@users.noreply.github.com>
…k, update documentation for single container usage vs Docker GRAB_SITE_HOST usage, set default GRAB_SITE_HOST build arg to gs-server instead of application default 127.0.0.1 with documented reasoning and how to change it
…ntainer usage vs Docker GRAB_SITE_HOST usage, set default GRAB_SITE_HOST build arg to gs-server instead of application default 127.0.0.1 with documented reasoning and how to change it
…Alpine version, remove need to chmod entrypoint, remove cargo dependency, move user creation above dependency installation, set uid and gid to 10000 for consistency
…quick start for universal docker run volume mount syntax
…t active user during `pip install`
…d how to pass configuration for grab-site crawls
Please provide information about https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker |
Is this still being worked on? |
@ImportTaste It should be complete, I just pulled the latest changes from here into my fork/branch so your build will be against the latest version of the code. Please let me know if you follow the usage instructions and it does not work so I can improve it. TL;DR: I'd call this the best starting point for anyone trying to run grab-site and gs-server on Docker. It might not be 100% complete but I've been running it ever since I submitted the PR on several VMs (AWS, Azure, GCP, VirtualBox, Proxmox), so it works on my machines! 😄 It would be great if this could be merged and then a separate issue can be created with an accurate report of how to reproduce these different environments. That way the foundational support isn't just hanging out forever on my fork. That being said, I have recently (last 9mo) documented setting up user/permissions between the host and container on a bone standard Debian install with an additional user. I believe that could address the SELinux user host<->container mismatch and allow it to run on those systems with whatever flag set ( I'd love to do it, I just need some help. So, instead of making assumptions, I'd appreciate a script and target OS that I can use to help me set up everyone's environment(s) locally in order to reproduce this last remaining issue to prepare a proper solution that is compatible with all configurations. |
this seems to require |
https://github.com/acrois/grab-site/blob/feature/dockerfile/Dockerfile#L47-L48 I know that pip is not ideal but this is supposed to cover the app dependency scenario already. If it is necessary it should be included as part of the list of dependencies, perhaps here in https://github.com/acrois/grab-site/blob/feature/dockerfile/setup.py#L12-L20 I added a build script that will take care of release needs. Please send me a reproduction using a command that leverages the new images.
Perhaps we need a test case added that complements the Yes or yes? |
Based on the initial work done by Fusl (credits preserved), the Dockerfile has been updated and improved as well as all its usage documented in the README. It strives to address all of the usability concerns mentioned amongst several other issues on this topic.
Please review the usage to begin running Docker grab-site on your machine
This new image is based on Alpine 3.13 and Python 3.7. The advantage with this new addition is that no previous support exists, and this breaks no existing installations only broadens them. Maintaining this will be easy and can be integrated into Travis CI. If you wish to run multiple grab-site instances, you may want to mount a different volume to your host machine for each container.
One limitation, which is unavoidable unless we make some slight changes to the existing installation process, is that external dependencies (Python libs defined in setup.py) are not able to be cached so any changes to the code will cause the dependencies to be downloaded fresh on a subsequent build.
Ideally, we would download these dependencies as a separate build step and then copy the code over and run the setup. It may be as simple as building the image and then running pip freeze > requirements.txt and then pip install but further testing would be required to determine what paths setup.py standalone vs pip install would use and didn't seem important enough to warrant the complexity of figuring it out.
There is no Docker Registry usage. If this is desired by the community in the future, it can be added simply by additional steps using Travis CI if we can integrate it and have it automatically deploy to GitHub Packages and/or Docker Hub from successful Travis CI build.
Finally, I will provide support to maintain and improve this image and its usage/deployment in the future.