-
Notifications
You must be signed in to change notification settings - Fork 11.4k
server : remove self-extend features #9860
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
if (!params.ctx_shift) { | ||
// this check is redundant (for good) | ||
// we should never get here, because generation should already stopped in process_token() | ||
slot.release(); | ||
send_error(slot, "context shift is disabled", ERROR_TYPE_SERVER); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngxson I think the comment is not entirely correct because in process_token()
we check agains the training context length (n_ctx_train
), while the slot's context slot.n_ctx
could be smaller. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm no, I did add a check against slot.n_ctx
. Is this what you're looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that, thanks.
Shouldn't we check this actually:
if (slot.n_prompt_tokens + slot.n_decoded >= n_ctx) {
Hmm, or maybe:
if (slot.n_past + slot.n_decoded >= n_ctx) {
Anyway, I will figure it out as I'm looking into this logic currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I misunderstood n_decoded
. Yeah, maybe we even need (int) system_tokens.size() + slot.n_prompt_tokens
because system_tokens
is already in KV cache before the first decode.
Thanks for looking into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sorry I haven't see #9811
ggml-ci
e5f74fe
to
8a1f439
Compare
* server : remove self-extend ggml-ci * server : fix context limit check to use slot.n_past ggml-ci
* server : remove self-extend ggml-ci * server : fix context limit check to use slot.n_past ggml-ci
* server : remove self-extend ggml-ci * server : fix context limit check to use slot.n_past ggml-ci
* server : remove self-extend ggml-ci * server : fix context limit check to use slot.n_past ggml-ci
target #9857, fix #9859
Drop support for the self-extend related arguments: