Skip to content

Undo generate X most recent tokens - technically feasible? #2946

Answered by ggerganov
KerfuffleV2 asked this question in Q&A
Discussion options

You must be logged in to vote

The n_past variable controls how much KV cache the llama_eval uses - i.e. it is the index. You can decrease it to "forget" the last token

Replies: 2 comments 9 replies

Comment options

You must be logged in to vote
6 replies
@ggerganov
Comment options

@KerfuffleV2
Comment options

KerfuffleV2 Sep 1, 2023
Collaborator Author

@ggerganov
Comment options

@KerfuffleV2
Comment options

KerfuffleV2 Sep 5, 2023
Collaborator Author

@ggerganov
Comment options

Answer selected by KerfuffleV2
Comment options

You must be logged in to vote
3 replies
@KerfuffleV2
Comment options

KerfuffleV2 Sep 2, 2023
Collaborator Author

@ghost
Comment options

@KerfuffleV2
Comment options

KerfuffleV2 Sep 2, 2023
Collaborator Author

# for free to join this conversation on GitHub. Already have an account? # to comment
Category
Q&A
Labels
None yet
2 participants