-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Use Intersect to Narrow Iterate Range and Reduce Memory Allocation #9271
base: main
Are you sure you want to change the base?
Conversation
Thanks a lot for the input. We have recently upgraded how the list works, and now we have an even better function that you could use instead. Right now there would be an issue if your uid range is too high, but still the numbers are too low. Now you can basically just call list.FindPosting(uid) to see if uid is present or not. But we shouldn't use it if uid intersection range is bigger than uids actually present. So we could even add a check using ApproxLen(). Let me know if you are willing to make the change, or should we? |
Thanks for your suggestion! I'm glad to contribute to this project. I’d like to make this change myself. |
By the way, I’d like to ask if this PR (#9218) is likely to be merged. The optimization of the eq execution plan seems to significantly improve performance. |
Yeah that PR is scheduled to be merged. We are still evaluating and reviewing that PR. |
I’ve pushed the updated changes to the PR. When you have time, could you kindly review it again? Let me know if there are any additional improvements you'd like to see. |
|
||
found, _, err := l.findPosting(opt.ReadTs, uid) | ||
if err != nil { | ||
l.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of manually calling RUnlock()
each time a return happens, you could do defer l.RUnlock()
at the start of the function for the same effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of manually calling
RUnlock()
each time a return happens, you could dodefer l.RUnlock()
at the start of the function for the same effect.
When implementing this code, I also considered making this change, but I referred to the previous implementation and believe it was designed to minimize the time spent holding the read lock. Therefore, I chose not to change this implementation.
if opt.First < 0 { | ||
absFirst = -opt.First | ||
} | ||
preAllowcateLength := min(absFirst, l.mutationMap.len()+codec.ApproxLen(l.plist.Pack)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a function for it (getting approx len)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the actual length could be way more for some postings (split postings). Lets use the exact length function here. It would only hurt in case of split postings. We can resolve that later.
return false | ||
} | ||
|
||
if opt.Intersect != nil && len(opt.Intersect.Uids) < l.mutationMap.len()+codec.ApproxLen(l.plist.Pack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same: use exact length.
lets create another function inside list that intersects with the list of uids.
} | ||
if found { | ||
res = append(res, uid) | ||
if checkLimit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking for negative first here like this, why not just check the ids in reverse.
Description
In our index data, some hot keys are associated with a large number of UIDs, but the set filtered by our function is relatively small. During performance testing on this dataset, I noticed that pl.Uids consumes a significant amount of CPU time for slice memory allocation. Therefore, I propose an optimization in the Uids function to leverage the range provided by Intersect to reduce the scope of temporary result sets, thereby minimizing memory allocation.