Repackaging of ORDERPACK2.0 for access from fpm(1)

There are several sort algorithms available in fpm(1) packages including the stdlib package, M_sort, … but ORDERPACK2.0 has some unique functionality, so I have a repackaged clone at

that others might find useful as well. It is notable that the original is already well documented and uses modern Fortran; and that there is a parallel wrapper for the basic indexing procedure available on github (see “See Also”) as well.

9 Likes

UPDATE:

The package is usable as-is, but efforts to improve unit testing, documentation, and functionality will be an ongoing project. Recent additions:

ORDERPACK2.0 was designed for sorting numeric data, but sorting of CHARACTER data is often required, so added support of CHARACTER type to the four principal
procedures: inssor(), mrgref(), mrgrnk(), refsor(). Will do the same for other routines where appropriate.

Individual man-pages for each routine have been started, with a working example
program included in each one:
https://urbanjost.github.io/orderpack/man3.html
These will continue to be refined, but should be helpful for anyone wishing to test or use the package.

Once conformant to the original goals, the intent is to leave ORDERPACK2.0 as a stable fpm(1) package and then create an ORDERPACK3.0 for non-compatible changes, except for adding CHARACTER support.

Suggestions on features for ORDERPACK3.0 and input on ORDERPACK2.0 for fpm(1) should be opened as issues on the github page.

2 Likes
git clone --branch 2.0.0 https://github.com/urbanjost/orderpack.git

is a version that includes documentation and confidence tests/unit tests for each routine, and and alternate module called M_overload that includes access to all the procedures using new names that are more descriptive, and an optional fpm plugin that can be used with fpm to access the documentation from a command line.

As @Beliavsky highlighted in a github Issues topic, the main problem with the current interface is selecting an appropriate algorithm, which depends on the size of the input array, available memory, whether the data is sorted or nearly sorted or random, and so on. So the current plan is to leave this as a separate repository and make a new interface in an ORDERPACK3.0 that selects the appropriate methods in as automated a fashion as possible (the same algorithm can by at least 200 times more efficient than a full sort in some cases, and at least 87 times slower than a full sort in others). Another incompatible but desirable feature is a generic interface that combines routines like SORT(3f) and the partial sorting routines into a single procedure with optional parameters. Because the changes would be limited by keeping everything in a single package (which I would generally prefer) a “version 3” seems the best approach. Hopefully the fresh face and fpm interface on the venerable ORDERPACK2.0 package will be useful to others, and I believe is essentially complete.

2 Likes

I would say just build the 3.0 from the 2.0 in the same repository. There’s no need to maintain the old version separately. If it’s tagged 2.0, people can always get it if they need it. Seems less confusing that way.

The recent project GitHub - snljty/fsort: A sort library with pure Fortran. says the following about sorting algorithms:

There are 2 constants defined in fsort.f90 : low_threshold = 8 and high_threshold = 8192 .

If the size of the array is lower than or equal to low_threshold , the insert sort method, instead of the general quick sort method, will be used, because althouth the average time complexity of insert sort is O(N^2) that seems greater than O(NlogN) of quick sort, when the size is really small, the insert method is actually faster.

If the size of the array is greater than low_threshold but lower than high_threshold , then the quick method will be used. For most situations this is the fastest method, but it is based on a recursive method and its space complexity is O(logN), so if the recursion is too deep there may cause stack overflow, hence if the size of the array is greater than or equal to high_threshold , the heap sort method will be used instead. This is usually slower than quick sort method for common data, but it will never meet the stack size limitation as it is a non-recursive algorithm with a O(1) space complexity, and also since its worst time complexity is O(NlogN) (where the worst time complexity of quick sort is O(N^2)), this will never cost too much time for a quite huge array even if the worst situation occured, as long as you can afford a O(NlogN) time complexity.

Several of the ORDERPACK routines already use a combination of techniques internally, but that is a nice example of the kind of wrapping that I also think would be useful in ORDERPACK, along with adding functions for binning and quartiles.

I want the present routines to be available in a stable format for use as a dependable remote fpm(1) package, and I am not sure the next version will be upward compatible and would be much simpler without the requirement to remain compatible so I am torn about whether to start a new package. As a general rule I agree it would be less confusing to keep it together. There has been some mention of making the interface look more like some other interfaces from other languages as well, which would require renaming almost everything. I think that an actual plan might be a good idea in this case. I am already using the package from fpm myself and so I want this interface to be stable :slight_smile: . Thanks for the suggestions and info; I think I will pause and actually sketch this one out a bit.

You might use the renaming capabilities of modules and interfaces if the actual argument list does not change. If it does, then some wrapper layer could be a solution (that would allow for compatibility as well). Just a few thoughts.

Yes, that allows for the new interface and can still leave the lower level available for those who know they want a specific original procedure; that might be the best solution.
It is doing some of that now, in that M_orderpack is just a wrapper around the individual procedures that renames them and makes them available from one module. So just expanding on that keeps it upward-compatible but allows for a new wrapper that makes it easier to use/remember.

Semantic versioning is commonly used in packaging systems similar to fpm. Here, incrementing the major version (e.g. going from 2.0 to 3.0) would signify a breaking change and retaining backwards compatibility is not required. Bugfixes to the 2.x series could also be published afterwards if needed.

Maybe that is the way to go, maybe a git branch too and just go ahead and make the new interface in the same repository.