Glmnet migrates to C++

Awesome. And when you pip install fypp, it doesn’t work?

C++ has benefited a lot from the competition among its compilers. Clang is catching up with g++ in terms of speed, and g++ has followed Clang to improve its error messages.

This pattern appears nonexistent in Fortran. In contrast, when a new feature is mentioned (probably has been around for 40 years in other programming languages), some dudes will instantly argue that Fortran compiler vendors will not implement the feature so it should not be considered. It sounds like that these compiler vendors are competing for conservatism.

Starting over with LFortran and continuously adding new features independent from the standard committee probably is the only way out. The bottom line is that for every extra piece of verbosity, there must be some gains in performance as compensation. Since C++ and Fortran are in the same league in terms of performance, Fortran should strive to make its syntax more concise and readable.

4 Likes

This is simply untrue. Since the 1990s, R users have continuously replaced Fortran with C as the extension programming language. Since the 2010s, C++ has been the first choice after the advent of the Rcpp package. There have been very few Fortran codes in the R ecosystem since the 2000s.

1 Like

My remark was meant only in the context of the routines underlying the glmnet package, which carried the header:

c     mortran 2.0     (version of 7/04/75 mod 7/4/87 (ajc))             
      subroutine wls(alm0,almc,alpha,m,no,ni,x,r,v,intr,ju,vp,cl,nx,thr,

and were probably (?) also interfaced earlier in S. If the routines didn’t serve the authors well enough in terms of performance, how else would you explain the fact they had not been rewritten in C/C++ earlier?

You are certainly right about R community as a whole moving away from Fortran. The R Consortium even ran a working group project called histoRicalg with the aim of preserving the knowledge on older Fortran routines. The description of the project reads:

Many (most?) of the algorithms making up the numerical building-blocks of scientific software today were developed between 1965 and 1980. Codes were mostly in Fortran, with some Algol and BASIC. Slightly after this period, some were written in Pascal. Today the proportion of workers in scientific and statistical computation who are fluent with these computer languages is small, and the original authors are retired or deceased, with the availibility of original codes becoming precarious as funding for projects such as the NIST Generally Available Mathematical Software (GAMS) disappears.

Some building blocks do however persist until today, the R source code repository remains 23 % Fortran:
slika

Granted, this is only for a few (old) packages, a vendored version of LAPACK, still older LINPACK routines, the loess routines for locally weighted regression, and the pppack package for splines.

1 Like

That’s true, of course, but from the point of view of a user, there are “different” Fortrans. The options above only make the compiler to allow nothing but standard Fortran but do not guarantee that every single standard feature is supported.

  1. Intel’s ifort claims (I guess) full support for F2018 but still, MacOS version does not support coarrays so it is not-so-fully-full. Not to mention that the installation of oneAPI is far from easy and that it became free not so long ago.
  2. GNU Fortran support for F2018 is definitely not full. The PDTs are broken. Coarrays are natively supported only in single mode, otherwise the OpenCoarrays are needed. The installation of OpenCoarrays for Linux (I can speak for Ubuntu and RHEL-related distros) is either hard or next to impossible. Fedora had it but abandoned. RHEL and its clones have never had it. Ubuntu had it in 18.04 LTS but since, 20.04 still has the libopencoarrays package but lacks open-coarrays-bin (incl. caf, cafrun etc.) so the user is left on his/her own to compile and link properly. Luckily Homebrew has a complete opencoarrays package for MacOS.
  3. Other compilers - I have no experience.

So, especially when one develops a code using most of modern Fortran, getting that code working might pose some problems.

2 Likes

I have now done some digging in glmnet, discovering I have been in error with respect to the wls and elnet routines in glmnet which are not 45+ years old. I apologize for this hasty mistake and skewing the tone of the thread.

The line

c     mortran 2.0     (version of 7/04/75 mod 7/4/87 (ajc))

is likely the signature produced by the Mortran preprocessor. The initials (ajc) refer to the author of the preprocessor - A. James Cook - working for SLAC (Stanford Linear Accelerator Center). Mortran 2.0 was released in 1975. One of the glmnet authors, Trevor Hastie, was formerly at Bell Labs, and later at Stanford University.

Until the most recent version of glmnet, it seems the actual work routines were written in the Mortran dialect. Focusing on the wls routine, this file is the preprocessor input, and this file is the generated output. This is the reason, why the code is hard to read. If the author were to merely update his preprocessor code to the latest standard, it would already be a huge improvement with regard to readability.

Here is an excerpt of the Mortran code:

<k=1,ni; if(iy(k).eq.1) next; if(ju(k).eq.0) next;
  if(g(k).gt.tlam*vp(k)) <iy(k)=1; xv(k)=dot_product(v,x(:,k)**2);>
>

And this is the Fortran output:

      do 10031 k=1,ni                                                   
      if(iy(k).eq.1)goto 10031                                          
      if(ju(k).eq.0)goto 10031                                          
      if(g(k) .le. tlam*vp(k))goto 10051                                
      iy(k)=1                                                           
      xv(k)=dot_product(v,x(:,k)**2)                                    
10051 continue                                                          
10031 continue   

The next keyword in Mortran is available today with cycle:

      do k = 1, ni                                                   
        if (iy(k) == 1) cycle
        if (ju(k) == 0) cycle
        if (g(k) > tlam*vp(k)) then                                
          iy(k) = 1                                                           
          xv(k) = dot_product(v,x(:,k)**2)                                    
        end if
      end do

In fact, the entire wls source code in Mortran, can be rewritten very nicely using the new features of Fortran (90?) like cycle, exit, and named do loops, without going through the trouble of deciphering the goto’s.

Now concerning the interface with R itself, I’ve inspected the glmnet routines, and they use the .Fortran interface which creates a deep copy (DUP) of matrix and vector arguments. Per the answer from Twitter user b_naras in the thread linked by @Beliavsky, it seems the speed-up is coming in part from the tight binding via the .Call method generated automatically through the Rcpp package.

That said, the C++ code has other improvements for readability, like using iterators, and the C++ STL. The sparse regression routines make use of the Eigen sparse matrix library, which are probably much more convenient than the raw Fortran methods.

It would be a nice project to port glmnet to Fortran entirely and see how close one can get in interface friendliness with respect to the wrappers, like:

Julia developers have also made a pure Julia implementation in Lasso.jl with allegedly better-performance than the Fortran version.

2 Likes

It would be ideal if the Mortran preprocessor itself could be rewritten to output modern Fortran from Mortran. Some explanations of it are here and here. The Mortran preprocessor can be compiled with

gfortran -std=legacy egs_fdate_stub.f mortran3.f

2 Likes

Here is an R package to facilitate calling Julia from R. The manual describes how the translation from R to Julia and the reverse is done. Ideally calling modern Fortran would be similarly simple.

JuliaConnectoR: A Functionally Oriented Interface for Integrating ‘Julia’ with R

Allows to import functions and whole packages from ‘Julia’ in R. Imported ‘Julia’ functions can directly be called as R functions. Data structures can be translated between ‘Julia’ and R.

3 Likes

It’s true that only a small fraction of new R packages use Fortran, and that many of the R packages that do have old-fashioned FORTRAN. OTOH, I see the R package

BayesFM: Bayesian Inference for Factor Modeling

Collection of procedures to perform Bayesian analysis on a variety of factor models. Currently, it includes: “Bayesian Exploratory Factor Analysis” (befa) from G. Conti, S. Frühwirth-Schnatter, J.J. Heckman, R. Piatek (2014) <doi:10.1016/j.jeconom.2014.06.008>, an approach to dedicated factor analysis with stochastic search on the structure of the factor loading matrix. The number of latent factors, as well as the allocation of the manifest variables to the factors, are not fixed a priori but determined during MCMC sampling.

which is currently maintained and uses modern Fortran, including modules and type-bound procedures. The system requirements say gfortran (>= 4.6.3), although binaries are distributed, so end users need not deal with Fortran directly.

2 Likes

I have used the dotCall64 package to call Fortran codes from R for a while. Based on my personal experience, I believe if glmnet uses the dotCall64 package to call Fortran codes, that would be as fast as using Rcpp to call C++ codes. The key reason for glmnet’s migration to C++ may still be that the legacy F77 codes are simply impossible to maintain.

I think that compared to adding more features to Fortran, getting new users to use the existing modern Fortran features is at least as important.

4 Likes