Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server #236

gjmulder · 2023-05-18T15:05:33Z

A lot of people would like to run their own server, but don't have the necessary DevOps skills to configure and build a llama-cpp-python + python + llama.cpp environment.

I'm working on developing some Dockerfiles that are run via a Github action to publish to Docker Hub similar to llama.cpp's workflows/docker.yml for both OpenBLAS (i.e. no NVidia GPU) and CuBLAS (NVidia GPU via Docker) support.

Which CC licensed models are now available that are compatible with llama.cpp's new quantized format? Ideally we want to start with small models to keep the Docker image sizes manageable.

The text was updated successfully, but these errors were encountered:

gjmulder · 2023-05-18T16:01:23Z

Looks like upstream support of Replit + MPT is maturing with less restrictive licensing:

replit-code-v1-3b is: CC BY-SA-4.0
MPT-7B-StoryWriter-65k+ is: Apache 2.0
MPT-7B-Instruct is: CC-By-SA-3.0
MPT-7B-Chat is: CC-By-NC-SA-4.0

jmtatsch · 2023-05-18T16:02:53Z

While in principle a great idea this will be a support nightmare.

The problem is that the binaries of llama.cpp are so optimized to using every possible acceleration feature that they will not run on the many different hardware architectures (especially the older ones which non power users are likely to have).
It will segfault then with a non-descriptive error message and people will be crying in the issues 😉
When building without those acceleration features it will become unusable slow.

gjmulder · 2023-05-18T16:06:46Z

It will segfault then with a non-descriptive error message and people will be crying in the issues

@jmtatsch better forewarned than snowed in issues, thx. How about if the images require a docker build against whatever Docker exposes in terms of local hardware, but come pre-loaded with models?

jmtatsch · 2023-05-18T16:10:35Z

Yes, that would solve the issue and the model can be downloaded within the build from huggingface.

gjmulder · 2023-05-25T12:44:23Z

#270 does most of the heavy lifting. Seems like people are already pushing images to Docker Hub and running into "illegal instructions" due to AVX, etc.

gjmulder · 2023-05-27T13:23:04Z

FORCE_CMAKE=1 \
CMAKE_ARGS="\
-DLLAMA_AVX=ON -DFMA=ON -DF16C=ON -D SSE3=ON \
-DLLAMA_AVX2=OFF -DLLAMA_AVX512=OFF -DAVX512_VBMI=OFF -AVX512_VNNI=OFF \
-DLLAMA_OPENBLAS=ON"

might be crippled enough to work for most people on x86_64

gjmulder · 2023-05-27T13:27:25Z

And we now have open llama w/Apache License Version 2.0

gjmulder · 2023-06-02T11:14:35Z

#310 automates the process of downloading an Apache 2.0 licensed Open Llama 3B model and installing it into a minimal Debian image with an OpenBLAS enabled server.

gjmulder · 2023-06-02T14:01:11Z

So, after a lot of trial and error it seems that forcing CMAKE Intel acceleration flags into a Docker file is a PITA.

gjmulder changed the title ~~Push Docker images to Dockerhub using Github actions~~ Push Docker images to Dockerhub using Github actions for running a llamma-cpp-python REST server May 18, 2023

gjmulder added enhancement New feature or request good first issue Good for newcomers server hardware Hardware specific issue model Model specific issue and removed good first issue Good for newcomers labels May 18, 2023

gjmulder mentioned this issue May 19, 2023

Precompiled wheels with CuBLAS activated #243

Closed

gjmulder changed the title ~~Push Docker images to Dockerhub using Github actions for running a llamma-cpp-python REST server~~ Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server Jun 2, 2023

abetlen closed this as completed Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server #236

Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server #236

gjmulder commented May 18, 2023

gjmulder commented May 18, 2023

jmtatsch commented May 18, 2023

gjmulder commented May 18, 2023

jmtatsch commented May 18, 2023

gjmulder commented May 25, 2023

gjmulder commented May 27, 2023 •

edited

Loading

gjmulder commented May 27, 2023

gjmulder commented Jun 2, 2023

gjmulder commented Jun 2, 2023

Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server #236

Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server #236

Comments

gjmulder commented May 18, 2023

gjmulder commented May 18, 2023

jmtatsch commented May 18, 2023

gjmulder commented May 18, 2023

jmtatsch commented May 18, 2023

gjmulder commented May 25, 2023

gjmulder commented May 27, 2023 • edited Loading

gjmulder commented May 27, 2023

gjmulder commented Jun 2, 2023

gjmulder commented Jun 2, 2023

gjmulder commented May 27, 2023 •

edited

Loading