How to infer multiple prompts(bs>1) at the same time? #1623

vicwer · 2023-05-28T07:55:38Z

No description provided.

KerfuffleV2 · 2023-05-29T01:25:57Z

Possibly with the server example: https://github.com/ggerganov/llama.cpp/tree/master/examples/server

You would need a script or something to manage the queries and collect the result.

liuxiaohao-xn · 2023-06-08T08:24:29Z

Possibly with the server example: https://github.com/ggerganov/llama.cpp/tree/master/examples/server

You would need a script or something to manage the queries and collect the result.

i have the same question.

I have multiple prompts, I want to feed them all at once to the model to generate the outputs，can you tell me how to achieve it ?

vicwer closed this as completed Jun 5, 2023

ghost mentioned this issue Aug 25, 2023

Generate multiple outputs #2789

Closed

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to infer multiple prompts(bs>1) at the same time? #1623

How to infer multiple prompts(bs>1) at the same time? #1623

vicwer commented May 28, 2023

KerfuffleV2 commented May 29, 2023

liuxiaohao-xn commented Jun 8, 2023

How to infer multiple prompts(bs>1) at the same time? #1623

How to infer multiple prompts(bs>1) at the same time? #1623

Comments

vicwer commented May 28, 2023

KerfuffleV2 commented May 29, 2023

liuxiaohao-xn commented Jun 8, 2023