Skip to content

Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server #236

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
gjmulder opened this issue May 18, 2023 · 9 comments
Labels
enhancement New feature or request hardware Hardware specific issue model Model specific issue server

Comments

@gjmulder
Copy link
Contributor

A lot of people would like to run their own server, but don't have the necessary DevOps skills to configure and build a llama-cpp-python + python + llama.cpp environment.

I'm working on developing some Dockerfiles that are run via a Github action to publish to Docker Hub similar to llama.cpp's workflows/docker.yml for both OpenBLAS (i.e. no NVidia GPU) and CuBLAS (NVidia GPU via Docker) support.

Which CC licensed models are now available that are compatible with llama.cpp's new quantized format? Ideally we want to start with small models to keep the Docker image sizes manageable.

@gjmulder gjmulder changed the title Push Docker images to Dockerhub using Github actions Push Docker images to Dockerhub using Github actions for running a llamma-cpp-python REST server May 18, 2023
@gjmulder gjmulder added enhancement New feature or request good first issue Good for newcomers server hardware Hardware specific issue model Model specific issue and removed good first issue Good for newcomers labels May 18, 2023
@gjmulder
Copy link
Contributor Author

Looks like upstream support of Replit + MPT is maturing with less restrictive licensing:

@jmtatsch
Copy link

While in principle a great idea this will be a support nightmare.

The problem is that the binaries of llama.cpp are so optimized to using every possible acceleration feature that they will not run on the many different hardware architectures (especially the older ones which non power users are likely to have).
It will segfault then with a non-descriptive error message and people will be crying in the issues 😉
When building without those acceleration features it will become unusable slow.

@gjmulder
Copy link
Contributor Author

It will segfault then with a non-descriptive error message and people will be crying in the issues

@jmtatsch better forewarned than snowed in issues, thx. How about if the images require a docker build against whatever Docker exposes in terms of local hardware, but come pre-loaded with models?

@jmtatsch
Copy link

Yes, that would solve the issue and the model can be downloaded within the build from huggingface.

@gjmulder
Copy link
Contributor Author

#270 does most of the heavy lifting. Seems like people are already pushing images to Docker Hub and running into "illegal instructions" due to AVX, etc.

@gjmulder
Copy link
Contributor Author

gjmulder commented May 27, 2023

FORCE_CMAKE=1 \
CMAKE_ARGS="\
-DLLAMA_AVX=ON -DFMA=ON -DF16C=ON -D SSE3=ON \
-DLLAMA_AVX2=OFF -DLLAMA_AVX512=OFF -DAVX512_VBMI=OFF -AVX512_VNNI=OFF \
-DLLAMA_OPENBLAS=ON"

might be crippled enough to work for most people on x86_64

@gjmulder
Copy link
Contributor Author

And we now have open llama w/Apache License Version 2.0

@gjmulder
Copy link
Contributor Author

gjmulder commented Jun 2, 2023

#310 automates the process of downloading an Apache 2.0 licensed Open Llama 3B model and installing it into a minimal Debian image with an OpenBLAS enabled server.

@gjmulder gjmulder changed the title Push Docker images to Dockerhub using Github actions for running a llamma-cpp-python REST server Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server Jun 2, 2023
@gjmulder
Copy link
Contributor Author

gjmulder commented Jun 2, 2023

So, after a lot of trial and error it seems that forcing CMAKE Intel acceleration flags into a Docker file is a PITA.

@abetlen abetlen closed this as completed Apr 6, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request hardware Hardware specific issue model Model specific issue server
Projects
None yet
Development

No branches or pull requests

3 participants