Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Idea: save/restore active objects on DB close/create #163

Open
jimfulton opened this issue May 14, 2017 · 2 comments
Open

Idea: save/restore active objects on DB close/create #163

jimfulton opened this issue May 14, 2017 · 2 comments

Comments

@jimfulton
Copy link
Member

Restarting a ZODB application can hurt due to the need for cache warming.

Persistent storage caches can help, but because they are secondary caches, they often don't have the most important objects. (Even the recent enhancements to the RelStorage local cache don't seem to help that much based on my experience with a RelStorage app running 2.1a2.)

Idea: add a DB option to save a list of object ids for all active (non-ghost) objects and, on startup, these oids could be prefetched to at least warm the storage caches. (If RelStorage grew a prefetch method that could prefetch multiple oids at once, this would be a win even though it would be synchronous.)

@jamadden
Copy link
Member

jamadden commented Jun 28, 2019

Even the recent enhancements to the RelStorage local cache don't seem to help that much based on my experience with a RelStorage app running 2.1a2.

It's gotten much better in the 3.0 series (details below).

If RelStorage grew a prefetch method that could prefetch multiple oids at once, this would be a win even though it would be synchronous.

It did, and it is. In a big way.

Below are benchmarks from zodbshootout 0.8. The first one is for MySQL with and without a persistent cache, and the second is for PostgreSQL with and without a persistent cache. 'add' is adding 1000 objects in a transaction, 'cold' is reading 1,000 objects after having emptied all caches (the storage's persistent cache automatically reloads itself from disk in that case). (I ran these quickly, without many iterations, so there's some variability in the numbers; the 'add' times varying show that it's up to 15% or so.)

Benchmark mysqlclient_hf_1000 mysqlclient-hf-pcache_1000
add 110 ms 93.9 ms: 1.17x faster (-14%)
cold 194 ms 26.5 ms: 7.32x faster (-86%)
prefetch_cold 47.1 ms 31.9 ms: 1.48x faster (-32%)
Benchmark psycopg2_hf_1000 psycopg2_hf_pcache_1000
add 73.9 ms 75.7 ms: 1.02x slower (+2%)
cold 113 ms 26.6 ms: 4.24x faster (-76%)
prefetch_cold 45.7 ms 32.1 ms: 1.42x faster (-30%)

In both cases, adding a persistent cache made 'cold' substantially faster --- the predictable read pattern here has a 100% hit rate from the persistent cache. Even doing a bunch of other writes to the database and then coming back to this cold read still gets great hit rates.

The interesting thing is that simply prefetching the data gets you almost as much of a benefit as a persistent cache. (And note how the persistent cache actually slowed down in prefetch just a bit: that's the overhead of iterating the ghosts and determining that there's actually no need to talk to the database at all, they're already cached.) Of course, this is a best-case scenario for prefetch: we know exactly what we need to read, and we have a fast local database connection.

As noted, persistent caches, unless quite large, don't necessarily have the most important objects. Very important, but rarely changed, objects, may only live in Connection caches. (More about that in a minute.) RelStorage already has the infrastructure to store a set of OIDs, and the ability to prefetch them at storage opening time (or even ensure they're persisted into the cache). What it can't do is get that list of OIDs. They're frustratingly close: I think they're (approximately) the union of the OIDs stored in each Connection's pickle cache in the DB's connection pool at the time of close(). There's that registerDB method that would let the storage know about the DB and get access to those Connections, except that's not what that method does; also, the connections are all closed and their cache cleared before the storage knows its being closed, so that's too late.

Maybe the DB.close() method could let its storage know that it's about to close (whether by method or by event)? Then actually getting that list of OIDs, persisting it, and prefetching it on open, would be up to the storage implementation. One could imagine a storage wrapper that does that for any storage if it was a method, or for any storage in any database if it was an event.

OK, that "more" for knowing important objects: It's quite possible that the set of important objects changes over time. The objects needed at startup might be very different from those in the steady-state of the application. This can be partly mitigated by using a larger persistent cache, especially if writes are relatively rare. But one might like to capture the working set at particular points in the application lifecycle and persist it for pre-fetching later (e.g., grab just before the first request, and also just after the last request; prefetch the first at startup, prefetch the second just after startup). That seems to suggest that perhaps one necessary, generic, primitive that could be provided by ZODB is simply "get the working set[1]" Other policies could take it from there (including the storage wrapper or event or just plain storage method call on close() mentioned above) without having to know the details about connection pools and pickle caches.

[1] Right now we can only provide LRU information from pickle caches, but other policies (like zopefoundation/persistent#45 if I ever get around to it) could let us provide a better picture of the true working set.

@jamadden
Copy link
Member

It's gotten much better in the 3.0 series (details below).

Right, those details. So 2.1's persistent cache turned out to do OK if you only had one process writing cache files, or if all processes writing cache files had essentially the exact same workload. The more the workloads differed, though, the worse the cache performance got. With just one process, it gets almost 100% hit ratio again. But look what happens when I add a second process with a different workload (-c 2):

Benchmark 2.1.1 persistent cache 3.0 master persistent cache
psycopg2_hf_pcache: add 1000 objects 104 ms 81.0 ms: 1.29x faster (-22%)
psycopg2_hf_pcache: read 1000 cold objects 144 ms 28.9 ms: 4.97x faster (-80%)
psycopg2_hf_pcache: read 1000 cold prefeteched objects 128 ms 33.6 ms: 3.80x faster (-74%)
psycopg2_hf: add 1000 objects 99.7 ms 82.3 ms: 1.21x faster (-17%)
psycopg2_hf: read 1000 cold objects 127 ms 118 ms: 1.08x faster (-7%)
psycopg2_hf: read 1000 cold prefeteched objects 130 ms 47.0 ms: 2.77x faster (-64%)

Might as well not have a persistent cache in 2.1, but 3.0 handles it just fine.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants