-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: implement a dirty flag #74
Conversation
This patch introduces a dirty flag and uses it to avoid unnecessary writes. 1. It avoids writes when creating new nodes. Instead, it just caches the new nodes and marks them dirty. 2. It only flushes modified nodes instead of all cached nodes. 3. Flush no longer clears the cache, just the dirty flags.
So, I was planning on rewriting this the same way I did the AMT, but that wasn't really necessary in this case. Simply fixing the unnecessary writes issue turned out to be much simpler. This appears to reduce writes by at least 40% in the "fill" benchmarks. Note: like with AMT v2, the root node will always be written on flush. Otherwise, we only write modified nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
One question, what is the motivation for no longer clearing caches from memory during flush?
--edit--
I now see the request for this in #72. It looks like it's a memory / time trade off and we care more about time than memory? In the filecoin state machine we don't do much with hamts after flushing so the choice doesn't seem strongly motivated for this use. Is there a motivation I'm missing?
#76 is now merged so this can be rebased and merged. |
Well, the memory would be dropped right after the structure is dropped (or whenever it's GCed) so I don't really see the difference in this case. It's not like the flushed hamt is kept in memory for any period of time really, and keeping a pointer to the node until it is doesn't seem like it incurs any cost, at the benefit of more generally usable and reusable implementation. Also, the state tree in between messages could reuse cached nodes, instead of reloading them a bunch of times, this seems to be to be the biggest benefit of this. Also it would be a bit weird if read caches are kept, where dirty caches are dropped. That seems like unexpected behaviour to me |
This patch introduces a dirty flag and uses it to avoid unnecessary writes.