VxSort is a repository that contains both the code accompanying the This goes to Eleven blog post series by @damageboy.
In addition, this repository contains the source code for the NuGet package by the same name that provides a ready to use implementation for sorting with managed code at a much higher speeds than what is currently possible with CoreCLR 3.0.
Add with Nuget.
using VxSort;
// ...
var r = new Random((int) DateTime.UtcNow.Ticks);
int[] lotOfNumbers = Enumerable.Repeat(100_000_000).Select(r.NextInt()).ToArray();
VectorizedSort.Sort(lotsOfNumbers);
// Wow
Currently, VxSort is very feature-less, Here's what it can do:
- Sort 32-bit integers, ascending
Here's what's missing, in terms of functionality, and the order at which it should probably be implemented:
- Primitive Support:
- Add 32-bit descending support: #2
- Add 32-unsigned ascending support]: #3 (slightly tricky):
- There is no direct unsigned support in AVX2, e.g. we have:
_mm256_cmpgt_epi32(__m256i a, __m256i b)
/CompareGreaterThan(Vector256<Int32>, Vector256<Int32>
but no unsigned variant for the comparison operation. - Instead we could:
- Perform a fake descending partition operation around the value 0, where all
>= 0
are on the left, and all "fake"< 0
values (e.g what is really unsigned values with the top bit set...) go to the right. - Procees to partition with ascending semantics the left portion, while partitioning with descensing semantics the right
- (Unsigned) Profit!
- Perform a fake descending partition operation around the value 0, where all
- There is no direct unsigned support in AVX2, e.g. we have:
- Add 32-bit unsigned descending support.
- Add 64-bit signed/unsigned ascending/descending support.
- Support 32/64 bit floating point sorting.
- Try to generalize the 32/64-bit support with generic wrappers to avoid code duplication
- 16 bit support (annoying since there is no 16 bit permute so perf will go down doing 16 -> 32 bit and back)
- 8 bit support (annoying since there is no 8 bit permute so perf will go down doing 16 -> 32 bit and back)
- Key/Value Sorting:
- Add a stable variant, tweaking the current double-pumped loop and switching to
PCSort
for stable sorting.
This is substantially slower, but such is life - Add an explicit unstable variant of sorting, for those who don't care/need it
- Add a stable variant, tweaking the current double-pumped loop and switching to
-
IComparer<T>
/Comparison<T>
-like based vectorized sorting:- In general, all hope is lost if
IComparer<T>
/Comparison<T>
or anything of that sort is provided. - Unless the
IComparer<T>
/Comparison<T>
is essentially some sort of a trivial/primitive "compare the struct/class by comparing member X, For example:
AnIComparer<T>
/Comparison<T>
that is using the 3rd member ofT
which is at a constant offset of 10-bytes into theT
strcut/class. - Those sorts of trivial
IComparer<T>
/Comparison<T>
could be actually solved for with a AVX2 gather operation:
gather all the keys at a given offset and performs the regular vectorized sorting. - This would require a new type of API where the user provides (perhaps?) an Expression<Func> that performs the key selection, that could be "reverse-engineered" to understand if the expression tree can be reduced to an AVX2 gather operation and so on...
- In general, all hope is lost if
- General / Good practices?:
- Transition to code-generating AVX2 Bitonic sorting to avoid maintaining source files with thousands of source lines that could be instead machine generated.
VxSort is based on the following ideas/papers:
- Fast Quicksort Implementation Using AVX Instructions by Shay Gueron & Vlad Krasnov for the basic idea of the Vectorized Partition Block.
- Position-counting sort by @dirtyhandscoding
VxSort uses the following projects:
Fody
by the Fody developers.LocalsInit.Fody
for getting rid of.locals init
by Lucas Trzesniewski- The Logo
sort
by Markus from the Noun Project
Dan Shechter, a.k.a @damageboy