Update `llama_cpp_server.py` fixing bugs with non-streaming response #85

orangethewell · 2024-12-25T13:21:35Z

This commit fixes the following bugs:

max_tokens as -1 return a error from python server, set it a default value as 8192 tokens for maximum capacity, but tests can be done with this;
data["content"] doesn't exist in llama-cpp-python server for response, better use the entire response json data because it's the same structure used by extractor.

This commit fixes the following bugs: - `max_tokens` as `-1` return a error from python server, set it a default value as 8192 tokens for maximum capacity, but tests can be done with this; - `data["content"]` doesn't exist in llama-cpp-python server for response, better use the entire response json data because it's the same structure used by extractor.

Maximilian-Winter · 2024-12-30T19:11:28Z

@orangethewell Thank you, this is great, just started working on the project again, will look into it next week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `llama_cpp_server.py` fixing bugs with non-streaming response #85

Update `llama_cpp_server.py` fixing bugs with non-streaming response #85

orangethewell commented Dec 25, 2024

Maximilian-Winter commented Dec 30, 2024

Update llama_cpp_server.py fixing bugs with non-streaming response #85

Are you sure you want to change the base?

Update llama_cpp_server.py fixing bugs with non-streaming response #85

Conversation

orangethewell commented Dec 25, 2024

Maximilian-Winter commented Dec 30, 2024

Update `llama_cpp_server.py` fixing bugs with non-streaming response #85

Update `llama_cpp_server.py` fixing bugs with non-streaming response #85