Use Intersect to Narrow Iterate Range and Reduce Memory Allocation #9271

gooohgb · 2025-01-09T09:22:18Z

Description

In our index data, some hot keys are associated with a large number of UIDs, but the set filtered by our function is relatively small. During performance testing on this dataset, I noticed that pl.Uids consumes a significant amount of CPU time for slice memory allocation. Therefore, I propose an optimization in the Uids function to leverage the range provided by Intersect to reduce the scope of temporary result sets, thereby minimizing memory allocation.

harshil-goel · 2025-01-09T09:42:58Z

Thanks a lot for the input. We have recently upgraded how the list works, and now we have an even better function that you could use instead. Right now there would be an issue if your uid range is too high, but still the numbers are too low. Now you can basically just call list.FindPosting(uid) to see if uid is present or not. But we shouldn't use it if uid intersection range is bigger than uids actually present. So we could even add a check using ApproxLen(). Let me know if you are willing to make the change, or should we?

gooohgb · 2025-01-09T10:04:44Z

Thanks for your suggestion! I'm glad to contribute to this project. I’d like to make this change myself.

gooohgb · 2025-01-09T10:37:24Z

Thanks a lot for the input. We have recently upgraded how the list works, and now we have an even better function that you could use instead. Right now there would be an issue if your uid range is too high, but still the numbers are too low. Now you can basically just call list.FindPosting(uid) to see if uid is present or not. But we shouldn't use it if uid intersection range is bigger than uids actually present. So we could even add a check using ApproxLen(). Let me know if you are willing to make the change, or should we?

By the way, I’d like to ask if this PR (#9218) is likely to be merged. The optimization of the eq execution plan seems to significantly improve performance.

harshil-goel · 2025-01-09T11:29:48Z

Yeah that PR is scheduled to be merged. We are still evaluating and reviewing that PR.

gooohgb · 2025-01-15T06:50:39Z

Thanks a lot for the input. We have recently upgraded how the list works, and now we have an even better function that you could use instead. Right now there would be an issue if your uid range is too high, but still the numbers are too low. Now you can basically just call list.FindPosting(uid) to see if uid is present or not. But we shouldn't use it if uid intersection range is bigger than uids actually present. So we could even add a check using ApproxLen(). Let me know if you are willing to make the change, or should we?

I’ve pushed the updated changes to the PR. When you have time, could you kindly review it again? Let me know if there are any additional improvements you'd like to see.

x/x.go

xqqp · 2025-01-17T10:31:21Z

posting/list.go

+
+			found, _, err := l.findPosting(opt.ReadTs, uid)
+			if err != nil {
+				l.RUnlock()


Instead of manually calling RUnlock() each time a return happens, you could do defer l.RUnlock() at the start of the function for the same effect.

Instead of manually calling RUnlock() each time a return happens, you could do defer l.RUnlock() at the start of the function for the same effect.

When implementing this code, I also considered making this change, but I referred to the previous implementation and believe it was designed to minimize the time spent holding the read lock. Therefore, I chose not to change this implementation.

harshil-goel · 2025-01-20T23:35:19Z

posting/list.go

+	if opt.First < 0 {
+		absFirst = -opt.First
+	}
+	preAllowcateLength := min(absFirst, l.mutationMap.len()+codec.ApproxLen(l.plist.Pack))


I think we have a function for it (getting approx len)

Also the actual length could be way more for some postings (split postings). Lets use the exact length function here. It would only hurt in case of split postings. We can resolve that later.

harshil-goel · 2025-01-20T23:44:20Z

posting/list.go

+		return false
+	}
+
+	if opt.Intersect != nil && len(opt.Intersect.Uids) < l.mutationMap.len()+codec.ApproxLen(l.plist.Pack) {


same: use exact length.
lets create another function inside list that intersects with the list of uids.

harshil-goel · 2025-01-20T23:45:24Z

posting/list.go

+			}
+			if found {
+				res = append(res, uid)
+				if checkLimit() {


instead of checking for negative first here like this, why not just check the ids in reverse.

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation

3b23c92

gooohgb requested a review from a team as a code owner January 9, 2025 09:22

Li added 6 commits January 10, 2025 16:14

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation

25a8814

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation

a5576a1

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation

17d0349

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation

2121ec2

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation

5516de8

handle negative opt.First

f25dfee

xqqp reviewed Jan 16, 2025

View reviewed changes

x/x.go Outdated Show resolved Hide resolved

Use native min function to replace custom implementation

0d023b0

xqqp reviewed Jan 17, 2025

View reviewed changes

harshil-goel reviewed Jan 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation #9271

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation #9271

gooohgb commented Jan 9, 2025

harshil-goel commented Jan 9, 2025

gooohgb commented Jan 9, 2025

gooohgb commented Jan 9, 2025

harshil-goel commented Jan 9, 2025

gooohgb commented Jan 15, 2025

xqqp Jan 17, 2025

gooohgb Jan 20, 2025

harshil-goel Jan 20, 2025 •

edited

Loading

harshil-goel Jan 20, 2025

harshil-goel Jan 20, 2025

harshil-goel Jan 20, 2025

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation #9271

Are you sure you want to change the base?

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation #9271

Conversation

gooohgb commented Jan 9, 2025

harshil-goel commented Jan 9, 2025

gooohgb commented Jan 9, 2025

gooohgb commented Jan 9, 2025

harshil-goel commented Jan 9, 2025

gooohgb commented Jan 15, 2025

xqqp Jan 17, 2025

Choose a reason for hiding this comment

gooohgb Jan 20, 2025

Choose a reason for hiding this comment

harshil-goel Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

harshil-goel Jan 20, 2025

Choose a reason for hiding this comment

harshil-goel Jan 20, 2025

Choose a reason for hiding this comment

harshil-goel Jan 20, 2025

Choose a reason for hiding this comment

harshil-goel Jan 20, 2025 •

edited

Loading