-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
FST#Compiler allocates too much memory #12598
Comments
I did an experiment: Index 5M random BytesRefs and count the byte usage when
While the bytesrefs are random, it may share little prefix and suffix, I tried to mock some common prefix/suffix for them like:
And here is the result:
We will allocate a 32kb while 99% cases only need 5kb. These results somewhat matches the allocation profile that we rarely need a second block in |
I get similar statistics for wikimediumall and here are the results when
It seems 1k bytes per block is enough here. 99% cases can be covered by single block and we at most need 600+ blocks for single FST. |
Thanks @gf2121 -- this is a great discovery (and thank you https://blunders.io for the awesome integrated profiling in Lucene's nightly benchmarks!).
It's always dangerous to test on random data (even random data that attempts to simulate realistic patterns): you'll draw random conclusions yet act on them as if they were trustworthy! Better to test by indexing true terms that show up in a real-world corpus. Still, I think this issue is indeed compelling -- most FSTs in practice are likely smaller than 32 KB. But one issue to be aware of is shrinking the block size will worsen this issue, where performance of building & immediately using (in HEAP, not via save/load) an FST has poor performance when the FST needs more than one block. Really we need to get away from FST attempting to re-implement an in-memory filesystem, badly! We've seen many problems from this over the years... like poor performance reading bytes in reverse, this issue (over-allocating blocks), the above linked issue (poor performance using a just-built FST). The FST compiler really ought to stream its bytes directly to But until we fix this "correctly" (stop trying to be a |
FWIW I was looking into this a bit when I saw this issue come in. Specifically on Solr 8.11, but as far as I can tell the changes in #12604 apply to 8.x as well. In a 30s async profiler memory allocation profile, can see that ~10% (4% on left and then ~6% near the middle) of total is this same FST byte array. I put up a backport PR for lucene-solr branch_8_11 - apache/lucene-solr#2677 |
Description
https://blunders.io/jfr-demo/indexing-4kb-2023.09.25.18.03.36/allocations-drill-down
The allocation profile for nightly indexing benchmark shows that
FST#ByteStore
occupies a rather high ratio because we always construct aFST#Compiler
each time callingLucene90BlockTreeTermsWriter$TermsWriter#writeBlocks
, which will allocate a new 32KBbyte[]
.As the profile shows, most of memory was allocated from
FST#init
, which meansBytesStore
rarely needs another page to store bytes. So I suspect we are using a too large page here. Maybe we should decrease the page size to 8k or even smaller ?The text was updated successfully, but these errors were encountered: