-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server #236
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Looks like upstream support of Replit + MPT is maturing with less restrictive licensing:
|
While in principle a great idea this will be a support nightmare. The problem is that the binaries of llama.cpp are so optimized to using every possible acceleration feature that they will not run on the many different hardware architectures (especially the older ones which non power users are likely to have). |
@jmtatsch better forewarned than snowed in issues, thx. How about if the images require a |
Yes, that would solve the issue and the model can be downloaded within the build from huggingface. |
#270 does most of the heavy lifting. Seems like people are already pushing images to Docker Hub and running into "illegal instructions" due to AVX, etc. |
might be crippled enough to work for most people on |
And we now have open llama w/Apache License Version 2.0 |
#310 automates the process of downloading an Apache 2.0 licensed Open Llama 3B model and installing it into a minimal Debian image with an OpenBLAS enabled server. |
So, after a lot of trial and error it seems that forcing CMAKE Intel acceleration flags into a Docker file is a PITA. |
A lot of people would like to run their own server, but don't have the necessary DevOps skills to configure and build a
llama-cpp-python + python + llama.cpp
environment.I'm working on developing some Dockerfiles that are run via a Github action to publish to Docker Hub similar to llama.cpp's workflows/docker.yml for both OpenBLAS (i.e. no NVidia GPU) and CuBLAS (NVidia GPU via Docker) support.
Which CC licensed models are now available that are compatible with
llama.cpp
's new quantized format? Ideally we want to start with small models to keep the Docker image sizes manageable.The text was updated successfully, but these errors were encountered: