Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

FST#Compiler allocates too much memory #12598

Closed
gf2121 opened this issue Sep 27, 2023 · 4 comments · Fixed by #12604
Closed

FST#Compiler allocates too much memory #12598

gf2121 opened this issue Sep 27, 2023 · 4 comments · Fixed by #12604

Comments

@gf2121
Copy link
Contributor

gf2121 commented Sep 27, 2023

Description

https://blunders.io/jfr-demo/indexing-4kb-2023.09.25.18.03.36/allocations-drill-down

The allocation profile for nightly indexing benchmark shows that FST#ByteStore occupies a rather high ratio because we always construct a FST#Compiler each time calling Lucene90BlockTreeTermsWriter$TermsWriter#writeBlocks, which will allocate a new 32KB byte[] .

As the profile shows, most of memory was allocated from FST#init, which means BytesStore rarely needs another page to store bytes. So I suspect we are using a too large page here. Maybe we should decrease the page size to 8k or even smaller ?

@gf2121
Copy link
Contributor Author

gf2121 commented Sep 28, 2023

I did an experiment: Index 5M random BytesRefs and count the byte usage when BytesStore#finish called. The following are the statistical results:

total sample: 23130

avg: 78.12
min: 7
mid: 36
pct75: 40
pct90: 40
pct99: 44
max: 10790

While the bytesrefs are random, it may share little prefix and suffix, I tried to mock some common prefix/suffix for them like:

if (R.nextBoolean()) {
  int prefixLen = R.nextInt(b.length / 2);
  System.arraycopy(commonPrefix, 0, b, 0, prefixLen);
}

if (R.nextBoolean()) {
  int suffixLen = R.nextInt(b.length / 2);
  System.arraycopy(commonSuffix, commonSuffix.length - suffixLen, b, b.length - suffixLen, suffixLen);
}

And here is the result:

total sample: 27235

avg: 820.540738020929
min: 8
mid: 24
pct75: 629
pct90: 3347
pct99: 5374
max: 29049

We will allocate a 32kb while 99% cases only need 5kb. These results somewhat matches the allocation profile that we rarely need a second block in BytesStore.

@gf2121
Copy link
Contributor Author

gf2121 commented Sep 28, 2023

I get similar statistics for wikimediumall and here are the results when BytesStore#finish called 1,000,000 times.

BytesStore#finish called: 1000000 times

min: 1
mid: 16
avg: 64.555987
pct75: 28
pct90: 57
pct99: 525
pct999: 4957
pct9999: 29124
max: 631700

It seems 1k bytes per block is enough here. 99% cases can be covered by single block and we at most need 600+ blocks for single FST.

@mikemccand
Copy link
Member

Thanks @gf2121 -- this is a great discovery (and thank you https://blunders.io for the awesome integrated profiling in Lucene's nightly benchmarks!).

While the bytesrefs are random, it may share little prefix and suffix, I tried to mock some common prefix/suffix for them like

It's always dangerous to test on random data (even random data that attempts to simulate realistic patterns): you'll draw random conclusions yet act on them as if they were trustworthy! Better to test by indexing true terms that show up in a real-world corpus.

Still, I think this issue is indeed compelling -- most FSTs in practice are likely smaller than 32 KB. But one issue to be aware of is shrinking the block size will worsen this issue, where performance of building & immediately using (in HEAP, not via save/load) an FST has poor performance when the FST needs more than one block.

Really we need to get away from FST attempting to re-implement an in-memory filesystem, badly! We've seen many problems from this over the years... like poor performance reading bytes in reverse, this issue (over-allocating blocks), the above linked issue (poor performance using a just-built FST). The FST compiler really ought to stream its bytes directly to DataOutput (inspired by Tantivy's awesome FST implementation) which will in general be backed by a "true" filesystem since FST compilation is fundamentally append-only writes.

But until we fix this "correctly" (stop trying to be a ramfs, badly), I agree we should tweak the block size for terms dict FST writing.

@risdenk
Copy link
Contributor

risdenk commented Oct 4, 2023

FWIW I was looking into this a bit when I saw this issue come in. Specifically on Solr 8.11, but as far as I can tell the changes in #12604 apply to 8.x as well.

In a 30s async profiler memory allocation profile, can see that ~10% (4% on left and then ~6% near the middle) of total is this same FST byte array.
Screenshot 2023-10-04 at 09 15 52

I put up a backport PR for lucene-solr branch_8_11 - apache/lucene-solr#2677

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants