Heads up, the point of this repository is not to be a beautiful UI or clean code, it's to demonstrate that it is possible to compress the conceptual content of massive bodies of text. The practical use is that this can be fed into a GPT algorithm so that it can handle bodies of text that far exceed the context window they're used to.
There is a lot of nuance as to how I think it would best integrate with a GPT algorithm, but on a high level, you can train one GPT algorithm on how to predict topics that should come next, and another GPT algorithm specialized in filling in the actual words and sentences to "fill in" the next topic.
Screenshare.-.2024-01-13.10_24_30.PM.mp4
Fair warning, the code itself is not super clean on this. I just wanted to spit out a proof of concept that you could compress sentences into smaller units based on clusters in vector space.