Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

maxTokenSize, is there also minTokenSize? #10

Open
ErwinAI opened this issue Nov 6, 2024 · 1 comment
Open

maxTokenSize, is there also minTokenSize? #10

ErwinAI opened this issue Nov 6, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@ErwinAI
Copy link

ErwinAI commented Nov 6, 2024

I noticed that sometimes, when combineChunks is set to true, some chunks contain just a few characters/tokens.
Would it be an idea to add minTokenSize and perhaps anything under that, to be added to the next chunk?

Not a high priority but might be a nice setting to make sure chunks have sufficient amounts of text in them.

@jparkerweb
Copy link
Owner

jparkerweb commented Nov 6, 2024

@ErwinAI would you be able to put together an example to work off of?

  • text file (or blob of text to chunk)
  • all the chunking settings

You can spin-up the new Web UI to help with this :)
https://github.com/jparkerweb/semantic-chunking/tree/main/webui

I have a demo server setup for the Web UI you could try as well:
https://chunking.dyndns.org/

Let me know if that isn't clear, but would really help me nail down the requirements and have a sanity check to test my changes against.

Additionally, you might get the effect you are looking for if you lower the combineChunksSimilarityThreshold value, as it will combine neighboring chunks (up to the max defined chunk token size).

Thanks :)

@jparkerweb jparkerweb added the question Further information is requested label Nov 6, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
question Further information is requested
Projects
Status: Backlog
Development

No branches or pull requests

2 participants