title | author | license | tags | summary | layout | src |
---|---|---|---|---|---|---|
Sorting Numeric Vectors in C++ and R |
Ross Bennett |
GPL (>= 2) |
stl benchmark |
Illustrates the comparison of different sorting algorithms with R and the C++ STL. |
post |
2013-01-31-sorting.cpp |
Consider the problem to sort all elements of the given vector in ascending
order. We can simply use the function std::sort
from the C++ STL.
{% highlight cpp %} #include <Rcpp.h> using namespace Rcpp;
// [[Rcpp::export]] NumericVector stl_sort(NumericVector x) { NumericVector y = clone(x); std::sort(y.begin(), y.end()); return y; } {% endhighlight %}
{% highlight r %} library(rbenchmark) set.seed(123) z <- rnorm(100000) x <- rnorm(100)
stopifnot(all.equal(stl_sort(x), sort(x)))
benchmark(stl_sort(z), sort(z), order="relative")[,1:4] {% endhighlight %}
test replications elapsed relative 1 stl_sort(z) 100 0.913 1.000 2 sort(z) 100 1.635 1.791
Consider the problem of sorting the first n
elements of a given vector.
The function std::partial_sort
from the C++ STL does just this.
{% highlight cpp %} // [[Rcpp::export]] NumericVector stl_partial_sort(NumericVector x, int n) { NumericVector y = clone(x); std::partial_sort(y.begin(), y.begin()+n, y.end()); return y; } {% endhighlight %}
An alternate implementation of a partial sort algorithm is to use
std::nth_element
to partition the given vector at the nth sorted
element and then use std::sort
, both from the STL, to sort the vector
from the beginning to the nth element.
For an equivalent implementation in R, we can use the sort
function by
specifying a vector of 1:n
for the partial argument (i.e. partial=1:n
).
{% highlight cpp %} // [[Rcpp::export]] NumericVector nth_partial_sort(NumericVector x, int nth) { NumericVector y = clone(x); std::nth_element(y.begin(), y.begin()+nth, y.end()); std::sort(y.begin(), y.begin()+nth); return y; } {% endhighlight %}
{% highlight r %} n <- 25000
stopifnot(all.equal(stl_partial_sort(x, 50)[1:50], nth_partial_sort(x, 50)[1:50]))
benchmark(stl_partial_sort(z, n), nth_partial_sort(z, n), sort(z, partial=1:n), order="relative")[,1:4] {% endhighlight %}
test replications elapsed relative 2 nth_partial_sort(z, n) 100 0.335 1.000 1 stl_partial_sort(z, n) 100 0.853 2.546 3 sort(z, partial = 1:n) 100 1.133 3.382
An interesting result to note is the gain in speed of
nth_partial_sort
over stl_partial_sort
. In this case, for the given
data, it is faster to use the combination ofstd::nth_element
and
std::sort
rather than std::partial_sort
to sort the first n
elements
of a vector.
{% highlight cpp %} // [[Rcpp::export]] NumericVector stl_nth_element(NumericVector x, int n) { NumericVector y = clone(x); std::nth_element(y.begin(), y.begin()+n-1, y.end()); return y; } {% endhighlight %}
Finally, consider a problem where you only need a single element of a
sorted vector. The function std::nth_element
from the C++ STL does just
this. An example of this type of problem is computing the median of a given
vector.
For an equivalent implementation in R, we can use the sort
function by
specifying a scalar value for the argument partial (i.e. partial=n
).
{% highlight r %}
stopifnot(all.equal(stl_nth_element(x, 43)[43], sort(x, partial=43)[43]))
benchmark(stl_nth_element(z, n), sort(z, partial=n), order="relative")[,1:4] {% endhighlight %}
test replications elapsed relative 1 stl_nth_element(z, n) 100 0.160 1.000 2 sort(z, partial = n) 100 0.466 2.913
While these are not huge speed improvements over the base R sort function, this post demonstrates how to easily access sorting functions in the C++ STL and is a good exercise to better understand the differences and performance of the sorting algorithms available in C++ and R.