-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Idea: save/restore active objects on DB close/create #163
Comments
It's gotten much better in the 3.0 series (details below).
It did, and it is. In a big way. Below are benchmarks from zodbshootout 0.8. The first one is for MySQL with and without a persistent cache, and the second is for PostgreSQL with and without a persistent cache. 'add' is adding 1000 objects in a transaction, 'cold' is reading 1,000 objects after having emptied all caches (the storage's persistent cache automatically reloads itself from disk in that case). (I ran these quickly, without many iterations, so there's some variability in the numbers; the 'add' times varying show that it's up to 15% or so.)
In both cases, adding a persistent cache made 'cold' substantially faster --- the predictable read pattern here has a 100% hit rate from the persistent cache. Even doing a bunch of other writes to the database and then coming back to this cold read still gets great hit rates. The interesting thing is that simply prefetching the data gets you almost as much of a benefit as a persistent cache. (And note how the persistent cache actually slowed down in prefetch just a bit: that's the overhead of iterating the ghosts and determining that there's actually no need to talk to the database at all, they're already cached.) Of course, this is a best-case scenario for prefetch: we know exactly what we need to read, and we have a fast local database connection. As noted, persistent caches, unless quite large, don't necessarily have the most important objects. Very important, but rarely changed, objects, may only live in Connection caches. (More about that in a minute.) RelStorage already has the infrastructure to store a set of OIDs, and the ability to prefetch them at storage opening time (or even ensure they're persisted into the cache). What it can't do is get that list of OIDs. They're frustratingly close: I think they're (approximately) the union of the OIDs stored in each Connection's pickle cache in the DB's connection pool at the time of Maybe the OK, that "more" for knowing important objects: It's quite possible that the set of important objects changes over time. The objects needed at startup might be very different from those in the steady-state of the application. This can be partly mitigated by using a larger persistent cache, especially if writes are relatively rare. But one might like to capture the working set at particular points in the application lifecycle and persist it for pre-fetching later (e.g., grab just before the first request, and also just after the last request; prefetch the first at startup, prefetch the second just after startup). That seems to suggest that perhaps one necessary, generic, primitive that could be provided by ZODB is simply "get the working set[1]" Other policies could take it from there (including the storage wrapper or event or just plain storage method call on [1] Right now we can only provide LRU information from pickle caches, but other policies (like zopefoundation/persistent#45 if I ever get around to it) could let us provide a better picture of the true working set. |
Right, those details. So 2.1's persistent cache turned out to do OK if you only had one process writing cache files, or if all processes writing cache files had essentially the exact same workload. The more the workloads differed, though, the worse the cache performance got. With just one process, it gets almost 100% hit ratio again. But look what happens when I add a second process with a different workload (
Might as well not have a persistent cache in 2.1, but 3.0 handles it just fine. |
Restarting a ZODB application can hurt due to the need for cache warming.
Persistent storage caches can help, but because they are secondary caches, they often don't have the most important objects. (Even the recent enhancements to the RelStorage local cache don't seem to help that much based on my experience with a RelStorage app running 2.1a2.)
Idea: add a DB option to save a list of object ids for all active (non-ghost) objects and, on startup, these oids could be prefetched to at least warm the storage caches. (If RelStorage grew a prefetch method that could prefetch multiple oids at once, this would be a win even though it would be synchronous.)
The text was updated successfully, but these errors were encountered: