-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search) #1392
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Sounds like beam search. The linked article has a nice explanation of it and describes some apparent downsides. |
Yes, that seems more-or-less it. I'm not entirely sure that the best sentence will be the one where you just multiply all the probabilities individual token. There is probably a better confidence score that uses the entire output of the LLM. What's needed is a good heuristic. My guess is that we would want the ends of the text to be more predictable than the beginning of the text. e.g. you can start a story with random characters but in the end you want the story to head to an inevitable conclusion. Beam search should be good for scientific questions but less good for creative story generation. I found another blog post about it. I read somewhere that Chat GPT uses Beam search. I don't know if this is true of not. I'm sure I saw a couple of times it backtracking itself. |
Were there any attempts at implementing beam-search for LLaMA? Whisper.cpp already has beam-search, so I wonder if some code can be copied from there. Beam-search seems to be very much a necessity for any complex reasoning. I think someone needs to take a look at this. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Ugh, a stalebot. Unless this is actually resolved, this issue is still quite relevant. Being that the behaviour is not exposed by the server, and relies on older code paths, I'm not sure that 2267 actually resolved it. |
Please re open if you need |
This is an idea I had which could theoretically produce better results. I thought about this years ago in relation to how humans think.
The way it works is this.
First generate tokens in the usual manner. (Lets assume temperature is set quite low so we are selecting the token with the highest probability each time). Let our prompt be "The biggest animal in the world"
Then the next tokens are:
"is" 99%
"the" 99%
"giraffe" 50%
So we keep going until it stops. We can give this sentence a confidence score perhaps this is just product (or sum?) of all the probabilities of the the selected tokens (or something more elaborate).
This is our first option. But we don't just say the first thing that comes into our heads! So the next thing we do is go back to the word with the smallest probability. In this case it was "giraffe" and we look at the next highest token "blue-whale" and then continue on from there. Creating a second sentence which we can give a confidence score to. Note- the further we go back in the sentence the more time it take to generate a new sentence so we have to estimate the payoff each time.
Our aim is to create as a sentence with the highest overall confidence score as we can in a given time limit. The algorithm that will do this can be worked out by probability theory.
Note that even though we choose a less likely token at some point, the overall confidence score of the whole sentence could be higher.
The assumption is that a sentence that "makes sense" and are "self-consistent" will have a higher overall confidence score than one that is more random since it will have more parts of the sentence in which follow naturally. On the other hand this may just tend to produce sentences which over-fit. I don't know I haven't tried it!!!
I'm sure this has been thought of before and no doubt someone has written a paper about it!
A more elaborate scheme would be for the LLM to split into two when it get's to a particularly low probability token, and then then have some overall controller which dictates which LLM to run one step, perhaps depending on it's current confidence score. This may have the disadvantage that perhaps none of the LLMS will make it to the end of the sentence before the time limit. (Equivalent to a human who can't think of what to say)
The bad thing about this scheme is that although a human can re-think a sentence, they remember the old sentence. Whereas this method, it loses memory of the old sentence past the token it is changing.
For the confidence scoring one might also use the entire output of the LLM not just the last token.
The text was updated successfully, but these errors were encountered: