Skip to content

server : remove self-extend features #9860

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 2 commits into from
Oct 12, 2024
Merged

Conversation

ggerganov
Copy link
Member

target #9857, fix #9859

Drop support for the self-extend related arguments:

| `-gan, --grp-attn-n N` | group-attention factor (default: 1)<br/>(env: LLAMA_ARG_GRP_ATTN_N) |
| `-gaw, --grp-attn-w N` | group-attention width (default: 512.0)<br/>(env: LLAMA_ARG_GRP_ATTN_W) |

Comment on lines +1801 to +1807
if (!params.ctx_shift) {
// this check is redundant (for good)
// we should never get here, because generation should already stopped in process_token()
slot.release();
send_error(slot, "context shift is disabled", ERROR_TYPE_SERVER);
continue;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngxson I think the comment is not entirely correct because in process_token() we check agains the training context length (n_ctx_train), while the slot's context slot.n_ctx could be smaller. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@ggerganov ggerganov Oct 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that, thanks.

Shouldn't we check this actually:

    if (slot.n_prompt_tokens + slot.n_decoded >= n_ctx) {

Hmm, or maybe:

    if (slot.n_past + slot.n_decoded >= n_ctx) {

Anyway, I will figure it out as I'm looking into this logic currently.

Copy link
Collaborator

@ngxson ngxson Oct 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I misunderstood n_decoded. Yeah, maybe we even need (int) system_tokens.size() + slot.n_prompt_tokens because system_tokens is already in KV cache before the first decode.

Thanks for looking into this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sorry I haven't see #9811

Base automatically changed from gg/server-remove-system-prompt to master October 12, 2024 11:51
@ggerganov ggerganov force-pushed the gg/server-remove-self-extend branch from e5f74fe to 8a1f439 Compare October 12, 2024 11:54
@ggerganov ggerganov merged commit 1bde94d into master Oct 12, 2024
57 checks passed
@ggerganov ggerganov deleted the gg/server-remove-self-extend branch October 12, 2024 13:06
drollings pushed a commit to drollings/llama.cpp that referenced this pull request Oct 18, 2024
* server : remove self-extend

ggml-ci

* server : fix context limit check to use slot.n_past

ggml-ci
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* server : remove self-extend

ggml-ci

* server : fix context limit check to use slot.n_past

ggml-ci
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* server : remove self-extend

ggml-ci

* server : fix context limit check to use slot.n_past

ggml-ci
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* server : remove self-extend

ggml-ci

* server : fix context limit check to use slot.n_past

ggml-ci
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server : remove self-extend features
2 participants