-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Customized text chunking #3251
Comments
You can create your own chunking logic by extending the text splitter. Then the text splitter gets passed into the node parser, which also gets passed into service context Text Splitter: Node Parser: |
can you give example of how to do it?
can you give example of how to do it? |
The best example is definitely the source code. Not sure what you have in mind though, it might be possible to achieve what you want with other methods |
I don't know if it's what you want, that's how I achieved it: class TxtParser(BaseParser):
def _init_parser(self) -> Dict:
return {}
def parse_file(self, file: Path, errors: str = "ignore") -> str:
pass
METIS_FILE_EXTRACTOR: Dict[str, BaseParser] = {
".csv": CSVParser(concat_rows=False),
".txt": TxtParser(),
}
documents = SimpleDirectoryReader(file_extractor=METIS_FILE_EXTRACTOR).load_data() |
How do I write my own logic for text chunking?
Which classes do I need to extend, and how do I return the final chunking output?
Any examples/documentation would be appreciated.
The text was updated successfully, but these errors were encountered: