Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation #9271

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

gooohgb
Copy link
Contributor

@gooohgb gooohgb commented Jan 9, 2025

Description

In our index data, some hot keys are associated with a large number of UIDs, but the set filtered by our function is relatively small. During performance testing on this dataset, I noticed that pl.Uids consumes a significant amount of CPU time for slice memory allocation. Therefore, I propose an optimization in the Uids function to leverage the range provided by Intersect to reduce the scope of temporary result sets, thereby minimizing memory allocation.

@gooohgb gooohgb requested a review from a team as a code owner January 9, 2025 09:22
@harshil-goel
Copy link
Contributor

Thanks a lot for the input. We have recently upgraded how the list works, and now we have an even better function that you could use instead. Right now there would be an issue if your uid range is too high, but still the numbers are too low. Now you can basically just call list.FindPosting(uid) to see if uid is present or not. But we shouldn't use it if uid intersection range is bigger than uids actually present. So we could even add a check using ApproxLen(). Let me know if you are willing to make the change, or should we?

@gooohgb
Copy link
Contributor Author

gooohgb commented Jan 9, 2025

Thanks for your suggestion! I'm glad to contribute to this project. I’d like to make this change myself.

@gooohgb
Copy link
Contributor Author

gooohgb commented Jan 9, 2025

Thanks a lot for the input. We have recently upgraded how the list works, and now we have an even better function that you could use instead. Right now there would be an issue if your uid range is too high, but still the numbers are too low. Now you can basically just call list.FindPosting(uid) to see if uid is present or not. But we shouldn't use it if uid intersection range is bigger than uids actually present. So we could even add a check using ApproxLen(). Let me know if you are willing to make the change, or should we?

By the way, I’d like to ask if this PR (#9218) is likely to be merged. The optimization of the eq execution plan seems to significantly improve performance.

@harshil-goel
Copy link
Contributor

Yeah that PR is scheduled to be merged. We are still evaluating and reviewing that PR.

@gooohgb
Copy link
Contributor Author

gooohgb commented Jan 15, 2025

Thanks a lot for the input. We have recently upgraded how the list works, and now we have an even better function that you could use instead. Right now there would be an issue if your uid range is too high, but still the numbers are too low. Now you can basically just call list.FindPosting(uid) to see if uid is present or not. But we shouldn't use it if uid intersection range is bigger than uids actually present. So we could even add a check using ApproxLen(). Let me know if you are willing to make the change, or should we?

I’ve pushed the updated changes to the PR. When you have time, could you kindly review it again? Let me know if there are any additional improvements you'd like to see.

x/x.go Outdated Show resolved Hide resolved

found, _, err := l.findPosting(opt.ReadTs, uid)
if err != nil {
l.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of manually calling RUnlock() each time a return happens, you could do defer l.RUnlock() at the start of the function for the same effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of manually calling RUnlock() each time a return happens, you could do defer l.RUnlock() at the start of the function for the same effect.

When implementing this code, I also considered making this change, but I referred to the previous implementation and believe it was designed to minimize the time spent holding the read lock. Therefore, I chose not to change this implementation.

if opt.First < 0 {
absFirst = -opt.First
}
preAllowcateLength := min(absFirst, l.mutationMap.len()+codec.ApproxLen(l.plist.Pack))
Copy link
Contributor

@harshil-goel harshil-goel Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a function for it (getting approx len)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the actual length could be way more for some postings (split postings). Lets use the exact length function here. It would only hurt in case of split postings. We can resolve that later.

return false
}

if opt.Intersect != nil && len(opt.Intersect.Uids) < l.mutationMap.len()+codec.ApproxLen(l.plist.Pack) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same: use exact length.
lets create another function inside list that intersects with the list of uids.

}
if found {
res = append(res, uid)
if checkLimit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking for negative first here like this, why not just check the ids in reverse.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants