-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Avoid extra loops on sorted data in the __subgroup_bubble_sorter::sort
#1874
base: main
Are you sure you want to change the base?
Conversation
….h - fix corner case performance issue in the __subgroup_bubble_sorter::sort Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
….h - avoid loops and condition checks for small data sizes in __subgroup_bubble_sorter::sort Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
|
||
_IndexType __n = __end - __start; | ||
|
||
switch (__n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no other occurrences of switch in the library. What about using if-else? I guess switch is not necessary here due to having only 4 options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe switch / case
should works faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For large number of elements switch
should perform better although I do not know in device code. However, with just 4 cases there's probably no difference.
In my opinion, unless there is a measurable performance difference between switch and if-else, then we should use if-else.
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
….h - fix review comment: allow ADL as it was originally Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
….h - fix review comment: the overflow is handled in the caller side, so it does not make sense to make a signed type. Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
….h - fix review comment: additional comments Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
63136a5
to
658f8df
Compare
if (__comp(__second_item, __first_item)) | ||
{ | ||
using std::swap; | ||
swap(__first_item, __second_item); | ||
} | ||
} | ||
break; | ||
default: // The case when __end > __start + 2 three or more source data items require full bubble sort | ||
do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation from https://en.wikipedia.org/wiki/Bubble_sort
In this PR we avoid extra loops on sorted data in the
__subgroup_bubble_sorter::sort
,which calls in sort algorithms on the device for every work item.
This is beneficial when the data is already sorted, which according to https://en.wikipedia.org/wiki/Sorting_algorithm is a common real-world case.