Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

gerrit: clean body for related-content documents #35

Open
zpavlinovic opened this issue Oct 3, 2024 · 1 comment
Open

gerrit: clean body for related-content documents #35

zpavlinovic opened this issue Oct 3, 2024 · 1 comment
Assignees

Comments

@zpavlinovic
Copy link
Contributor

The token limit for the Gemini embedding model is 2048 (which is roughly 8200 characters), and the model will silently truncate the input to its max allowed length. Consider removing irrelevant comments, such as Trybot messages.

@zpavlinovic zpavlinovic self-assigned this Oct 3, 2024
gopherbot pushed a commit that referenced this issue Oct 4, 2024
This will give us an idea of how often and how much of the
truncation actually happens.

Updates #35

Change-Id: I90669124c9447645081aed43ed2c4f638c2c80c7
Reviewed-on: https://go-review.googlesource.com/c/oscar/+/617757
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Tatiana Bradley <tatianabradley@google.com>
@zpavlinovic
Copy link
Contributor Author

Around 7% of all CLs on the Go repo are above the Gemini token limit.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant