Proposal : CompressedIndex, a new Index Structure #185 #187
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request is linked to the proposal #185
Hi guys,
I had an idea so I went through with it to compare with the current behavior.
Here : https://github.com/agourdel/outlines-core/tree/opt/new-index-struct you can find a version of outlines-core with a new index structure called CompressedIndex.
The details of the structure can be found in_./src/index/compressed_index.README_ (thanks LLM) but the main idea was, hashmap is expensive in memory and slow in access when it comes to store a lot of transitions and a lot of state , So, what if we tried to store every allowed tokens for every states in a vector of bitmasks ? (token_masks)
I started by adding an asbtract layer called IndexVariant between Guide Object and Index Objects to allow Guides with different kind of Index structures.
Then I built the CompressedIndex structure. I coded it with one standard Index as parameter in the constructor because I'm lazy but it could be/should be instantiate with regex and vocabulary as standard index does .
Then, I made benchmark for a few regex usecase as follows with GPT2 Vocabulary :
So, 4 regexes and 3 Json structures.
First of all, I wanted to know the size of each index in memory based on each regexe.

(We are in Rustland)
As you can see the results are a bit meandering but after investigation it turns out that the determining factor is the transitions/states ratio.
The higher it is, the more the CompressedIndex will save memory. So bigger the vocab is, best the saving is.
With an equilibrium around 1200 transitions per states
(The benchmark used is ./src/index/test_bench_memory.rs )
After that, I decided to make them compete for computing perfomances.
In pythonland, for each regexes I created a Guide with the standard index and a Guide with the compressed index. Then, I make them, for one regex, take exactly the same random path in the DFA.
The times displayed correspond only to the time needed to make the advance
Each advance is an iteration. The technical difference between the two indexes is that for the standard one, at the end of the iteration we have a list of allowed token_id and for the compressed one, at the end of the iteration we have a bitset mask of the entire vocab with bit == 1 when the token is allowed. With GPT2.
( The benchmark used is ./benchmarks/bench_index_variant.py )
As you can see, the performances "by mask" of the CompressedIndex are constant. Whatever the regex which is used. And more you take a deep path in the DFA (lot of tokens), more the SpeedUp Ratio is in favor of the CompressedIndex.
I made the same benchmark with unsloth/Llama-3.1-8B-Instruct (128 257 tokens) :
It's all about the size of the inference output. (Number of generated tokens).
The regex + The vocab size give us a transitions/states ratio.
The transitions/states ratio + the vocab size give us an equilibrium (A number of generated tokens where the StandardIndex and the CompressIndex have the same performance).
Then, the further you go beyond the equilibrium, the better the compressedIndex performs.
(This is why there are two lines "schema_complexe", one with 283 tokens generated ad one with 2601 generated tokens. Both following the same DFA)
So in Conclusion, I think, the CompressedIndex or something like that should be considerated as possible improvment for the futur of outlines-core. (for The public repo at least)
If I had to venture further, I would say that we can have even better performance than this by stopping transforming the given input structure into a single regex/ single DFA but rather creating an acyclic graph where only some nodes are DFAs for "local regex" because there is no path dependency between a subregex to establish a phone number and a subregex to establish an email inside one Json Structure but considered as a single regex we, at the instanciation, create a combinatorial explosion, C(R1xR2) instead of just C(R1)+C(R2).
What do you think ?