Functional Programming Addition to stdlib

I was interested! But after inspecting a bit I got scared by performance implications. I saw then the github issue about time complexity and the discussion on HN and decided to roll my own, loop-based, implementation of union.

Some more info about what could have been my use case, if it is interesting to you:

This morning I stumbled upon the problem of building a set union of many small arrays, with (a lot) of values in common (in the worst case scenario the size of the union would be orders of magnitude smaller than the size of a simple concatenation). So I started a google hunt for an efficient implementation of such a thing. When I stumbled on your repo my first reaction was “yes, a clean, well documented, full-fledged library focusing on this domain, perfect!”.
But then I started inspecting the source code (I have arrays of derived types, so I could not just include the dependency, but instead try to copy the algorithm in my own code).

Two things immediately came to my attention

  • The body of union(a,b) itself deferres the functionality to set (which seems kind of obvious) applied to the merge of the two arrays. But such a merge is performed as [a,b] which I think is far from performance-optimized. I’m almost sure move_alloc should be used instead (but happy to be proven wrong, the automatic concatenation is infinitely more intuitive and elegant, syntax-wise).

  • The implementation of set(a) is recursive which has rung the definitive bell. Too much worry about stack management. Still, given the excellent presentation of the library in the README I thought I should be mistaken and came here to investigate about tail-call optimization in fortran and so on, to find that no, the entire project has been carried with no attention about this kind of issues, the scope itself was different[1].

This is perfectly fine, don’t get me wrong, but it calls in my opinion for a clear statement about scope and status in the README. Probably something as eye catching as a shields.io badge like:

lifecycle-experimental


  1. The set implementation could have well been done in a loop, without even really changing the logic: you loop on all the indices of the input array and, if the condition you already use inside the pack call is satisfied, you copy the value in a preallocated —set to all zeros— comformant array. Then, when you finish your span, you call just once pack on this array with the array/=0 condition and yeah, that’s your result. All this to say that I get that your implementation was more about an intellectual experiment, then production code. And that’s what I mean with “different scope”. ↩︎

1 Like