Undo generate X most recent tokens - technically feasible? #2946
-
I know this feature doesn't exist currently. There's also a naive way to do this where you just save the whole state every token, then you can restore to whatever point (or use a ring buffer that saves a certain number). That would use a lot of memory (and be pretty slow copying the state around also). Is there a better way? Let's assume there's nothing weird going on like wrapping contexts, ropescaling, non-LLaMa models, etc. Can I look at the state (I think it's just KV states?) like:
at the beginning and then
after 3 tokens have been generated. Then if I want to erase the previous token and try to regenerate it, I can zero out that "slot" in the state, move the position back a token and try again (possibly after setting whatever the logit for the previously generated token was to |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 9 replies
-
The |
Beta Was this translation helpful? Give feedback.
-
I may be misunderstanding & I know |
Beta Was this translation helpful? Give feedback.
The
n_past
variable controls how much KV cache thellama_eval
uses - i.e. it is the index. You can decrease it to "forget" the last token