[Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search) #1392

elephantpanda · 2023-05-10T16:54:24Z

This is an idea I had which could theoretically produce better results. I thought about this years ago in relation to how humans think.

The way it works is this.

First generate tokens in the usual manner. (Lets assume temperature is set quite low so we are selecting the token with the highest probability each time). Let our prompt be "The biggest animal in the world"

Then the next tokens are:

"is" 99%
"the" 99%
"giraffe" 50%

So we keep going until it stops. We can give this sentence a confidence score perhaps this is just product (or sum?) of all the probabilities of the the selected tokens (or something more elaborate).

This is our first option. But we don't just say the first thing that comes into our heads! So the next thing we do is go back to the word with the smallest probability. In this case it was "giraffe" and we look at the next highest token "blue-whale" and then continue on from there. Creating a second sentence which we can give a confidence score to. Note- the further we go back in the sentence the more time it take to generate a new sentence so we have to estimate the payoff each time.

Our aim is to create as a sentence with the highest overall confidence score as we can in a given time limit. The algorithm that will do this can be worked out by probability theory.

Note that even though we choose a less likely token at some point, the overall confidence score of the whole sentence could be higher.

The assumption is that a sentence that "makes sense" and are "self-consistent" will have a higher overall confidence score than one that is more random since it will have more parts of the sentence in which follow naturally. On the other hand this may just tend to produce sentences which over-fit. I don't know I haven't tried it!!!

I'm sure this has been thought of before and no doubt someone has written a paper about it!

A more elaborate scheme would be for the LLM to split into two when it get's to a particularly low probability token, and then then have some overall controller which dictates which LLM to run one step, perhaps depending on it's current confidence score. This may have the disadvantage that perhaps none of the LLMS will make it to the end of the sentence before the time limit. (Equivalent to a human who can't think of what to say)

The bad thing about this scheme is that although a human can re-think a sentence, they remember the old sentence. Whereas this method, it loses memory of the old sentence past the token it is changing.

For the confidence scoring one might also use the entire output of the LLM not just the last token.

extradosages · 2023-05-10T21:42:11Z

Sounds like beam search. The linked article has a nice explanation of it and describes some apparent downsides.

elephantpanda · 2023-05-10T22:06:21Z

Yes, that seems more-or-less it.

I'm not entirely sure that the best sentence will be the one where you just multiply all the probabilities individual token.

There is probably a better confidence score that uses the entire output of the LLM.

What's needed is a good heuristic. My guess is that we would want the ends of the text to be more predictable than the beginning of the text. e.g. you can start a story with random characters but in the end you want the story to head to an inevitable conclusion.

Beam search should be good for scientific questions but less good for creative story generation.

I found another blog post about it.

I read somewhere that Chat GPT uses Beam search. I don't know if this is true of not. I'm sure I saw a couple of times it backtracking itself.

walking-octopus · 2023-06-11T16:37:47Z

Were there any attempts at implementing beam-search for LLaMA? Whisper.cpp already has beam-search, so I wonder if some code can be copied from there.

Beam-search seems to be very much a necessity for any complex reasoning. I think someone needs to take a look at this.

github-actions · 2024-04-09T01:09:27Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Dessix · 2024-04-09T05:22:53Z

Ugh, a stalebot. Unless this is actually resolved, this issue is still quite relevant. Being that the behaviour is not exposed by the server, and relies on older code paths, I'm not sure that 2267 actually resolved it.

phymbert · 2024-04-09T07:39:22Z

Please re open if you need

elephantpanda changed the title ~~[Enhancement]: Global probability maximization with backtracking for better results.~~ [Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search) May 10, 2023

mattpulver mentioned this issue Jul 18, 2023

Add llama_beam_search(). #2267

Merged

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

phymbert mentioned this issue Apr 9, 2024

issues: ci - close inactive issue with workflow #6053

Merged

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search) #1392

[Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search) #1392

elephantpanda commented May 10, 2023 •

edited

Loading

extradosages commented May 10, 2023

elephantpanda commented May 10, 2023 •

edited

Loading

walking-octopus commented Jun 11, 2023 •

edited

Loading

github-actions bot commented Apr 9, 2024

Dessix commented Apr 9, 2024 •

edited

Loading

phymbert commented Apr 9, 2024

[Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search) #1392

[Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search) #1392

Comments

elephantpanda commented May 10, 2023 • edited Loading

extradosages commented May 10, 2023

elephantpanda commented May 10, 2023 • edited Loading

walking-octopus commented Jun 11, 2023 • edited Loading

github-actions bot commented Apr 9, 2024

Dessix commented Apr 9, 2024 • edited Loading

phymbert commented Apr 9, 2024

elephantpanda commented May 10, 2023 •

edited

Loading

elephantpanda commented May 10, 2023 •

edited

Loading

walking-octopus commented Jun 11, 2023 •

edited

Loading

Dessix commented Apr 9, 2024 •

edited

Loading