I can confirm, mrgrnk from ORDERPACK is quite fast for this problem:
$ ./sort_benchmark
| algorithm | Time for 1 sort (s) |
| ----------------------------- | ------------------- |
| futils argsort | 2.6560636E-05 |
| stdlib sort_index | 1.5619091E-06 |
| sorting_module quicksort | 9.1251818E-06 |
| sorter sortedIndex | 2.1222727E-06 |
| C++ std::sort | 1.5285455E-06 |
| C++ std::stable_sort | 1.4374545E-06 |
| quicksort_own_2d_swapped | 3.7542727E-06 |
| mrgrnk | 1.0954545E-06 |
It’s good to know that specialized Fortran routines can still outperform the C++ standard library. With mrgrnk we’re looking at a speed-up of ×20 (for the sorting part) already compared to the original futils subroutine.
@nicholaswogan, does this mean the other part of your computation (binning?) becomes the bottleneck?