-
Notifications
You must be signed in to change notification settings - Fork 11.5k
llama.cpp server with LLava stuck after image is uploaded on the first question #3798
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Closed
4 tasks done
Labels
bug
Something isn't working
Comments
Same issue here: Using server API directly, this is the log.
Using server API directly, just omitting the image_data parameter, works.this is the log.
|
ggerganov
added a commit
that referenced
this issue
Oct 26, 2023
Should be fixed now |
Yesterday I was unable to get Llava to work with the following commands and the latest build. Just wanted to say that fixed the issue for me. Thanks! Commands run:
|
Working here too. Thank! |
Working fine. |
mattgauf
added a commit
to mattgauf/llama.cpp
that referenced
this issue
Oct 27, 2023
* master: (350 commits) speculative : ensure draft and target model vocab matches (ggml-org#3812) llama : correctly report GGUFv3 format (ggml-org#3818) simple : fix batch handling (ggml-org#3803) cuda : improve text-generation and batched decoding performance (ggml-org#3776) server : do not release slot on image input (ggml-org#3798) batched-bench : print params at start log : disable pid in log filenames server : add parameter -tb N, --threads-batch N (ggml-org#3584) (ggml-org#3768) server : do not block system prompt update (ggml-org#3767) sync : ggml (conv ops + cuda MSVC fixes) (ggml-org#3765) cmake : add missed dependencies (ggml-org#3763) cuda : add batched cuBLAS GEMM for faster attention (ggml-org#3749) Add more tokenizer tests (ggml-org#3742) metal : handle ggml_scale for n%4 != 0 (close ggml-org#3754) Revert "make : add optional CUDA_NATIVE_ARCH (ggml-org#2482)" issues : separate bug and enhancement template + no default title (ggml-org#3748) Update special token handling in conversion scripts for gpt2 derived tokenizers (ggml-org#3746) llama : remove token functions with `context` args in favor of `model` (ggml-org#3720) Fix baichuan convert script not detecing model (ggml-org#3739) make : add optional CUDA_NATIVE_ARCH (ggml-org#2482) ...
olexiyb
pushed a commit
to Sanctum-AI/llama.cpp
that referenced
this issue
Nov 23, 2023
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Please provide a detailed written description of what you were trying to do, and what you expected
llama.cpp
to do.Current Behavior
Please provide a detailed written description of what
llama.cpp
did, instead.Environment and Context
Running
./server -t 4 -c 4096 -ngl 50 -m /Users/slava/Documents/Development/private/AI/Models/llava1.5/ggml-model-q5_k.gguf --host 0.0.0.0 --port 8007 --mmproj /Users/slava/Documents/Development/private/AI/Models/llava1.5/mmproj-model-f16.gguf
Environment info:
Failure Information (for bugs)
The inference is stuck, no output.
Steps to Reproduce
rec.mp4
Failure Logs
Run log
The text was updated successfully, but these errors were encountered: