-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Too much memory used when doing aofshrink in master/slave mode #258
Comments
Hi Curry, This does sounds like a problem. I'll need to investigate further, but just so that I understand: This issue is occurring only on the slave, and after the slave reconnects? |
@tidwall you are right. It is a slave problem, and happens after reconnect. |
Frankly speaking,
is bad design. I think a priorityqueue is better for background expiring because we want to find the item need to be expired which its expire time is closest to now |
Ouch. Well I can say that some thought has been put into the current implementation. I found that an array outperforms a data structure such as a heap queue for systems with a high frequency of SETs with TTLs. It also avoids bunches of pointers to alleviate extra stress on the heap and GC. Please keep in mind that the exlist array only contains hints at what may need to be purged at some point in the future. The #156 has a bit of discussion around this topic and was the catalyst for the current system. As for the original issue. After the AOFSHRINK on a leader has completed the followers must then resync their AOF file and reset their working data. This reset operation did not include clearing the exlist, thus upon the next sync the exlist grew needlessly. This may be why you are seeing out of memory panics on your server. I just pushed an update that now clears the exlist when on a reset. Please let me know if this update suffices. Thanks! |
You hit the point. I think the update works. |
Great news! |
Sorry, I think I miss some point. After aofshrink (in master)
So, here is how tile38 find the "some point": Lines 163 to 198 in 2088b5d
So after aofshrink, at most of the time, c.matchChecksums(conn, min, checksumsz) will return match=false, when min=0. Because the master's aof is re-ordered. So the reset function will not be called (in slave) after aofshrink (in master), and Controller.followChecksome return in this branch, while pos=0 and fullpos=0 Lines 206 to 211 in 2088b5d
|
I can confirm this issue on my side too. I'll do some digging. |
This was fixed in v1.16.2. |
Hi, I insert over 5 million point into the master, and it expire in 72 hours.
When i do aofshrink in master, I found the slave will use a lot memory and cannot recover from gc(even debug.FreeOsMemory()).
I profile the memory use pprof. I think there might be some problem in set a point which will be expired in the future in slave.
tile38/controller/expire.go
Line 78 in e1fe83c
After the aofshrink process finished in master, the master will force the slave to reconnect to itself. The slave will refollow again from some point of the aof file, including set point again.
In expire.go, Controller.expireAt function, We always append a new exitem into the exlist, ignoring that the key might already exists.
I just simply think this can be avoided.
The text was updated successfully, but these errors were encountered: