-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Use raft::copy in the inner loop of the host-side raft::gather #2464
base: branch-25.02
Are you sure you want to change the base?
Conversation
Converting this to a draft until we properly benchmark the change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change. LGTM! But it would be nice to compare the perf in your benchmarking results.
for (IdxT k = 0; k < buff.extent(1); k++) { | ||
buff(i, k) = dataset(in_idx, k); | ||
} | ||
raft::copy(res, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the resource is actually unused since the src and dest are both in host mem? I wonder if we can create a resource on the fly and supply it to raft::copy()
rather than adding the resource to gather_buff
's signature.
Use
raft::copy
instead of an assignment operator loop in the host-side gather operation. This should allow to leverage the logic ofraft::copy
to use fast memcpy when possible (especially for high-dimensional datasets).As some very circumstantial evidence, this change is observed to reduce
raft::matrix::sample_rows(100000000, 128)
time from approximately 5 to 4 minutes (bigann-100M int8 dataset in RAM, 16-core machine).Make the NVTX annotations a little bit more descriptive along the way.