Awesome. And when you pip install fypp
, it doesn’t work?
C++ has benefited a lot from the competition among its compilers. Clang is catching up with g++ in terms of speed, and g++ has followed Clang to improve its error messages.
This pattern appears nonexistent in Fortran. In contrast, when a new feature is mentioned (probably has been around for 40 years in other programming languages), some dudes will instantly argue that Fortran compiler vendors will not implement the feature so it should not be considered. It sounds like that these compiler vendors are competing for conservatism.
Starting over with LFortran and continuously adding new features independent from the standard committee probably is the only way out. The bottom line is that for every extra piece of verbosity, there must be some gains in performance as compensation. Since C++ and Fortran are in the same league in terms of performance, Fortran should strive to make its syntax more concise and readable.
This is simply untrue. Since the 1990s, R users have continuously replaced Fortran with C as the extension programming language. Since the 2010s, C++ has been the first choice after the advent of the Rcpp package. There have been very few Fortran codes in the R ecosystem since the 2000s.
My remark was meant only in the context of the routines underlying the glmnet
package, which carried the header:
c mortran 2.0 (version of 7/04/75 mod 7/4/87 (ajc))
subroutine wls(alm0,almc,alpha,m,no,ni,x,r,v,intr,ju,vp,cl,nx,thr,
and were probably (?) also interfaced earlier in S. If the routines didn’t serve the authors well enough in terms of performance, how else would you explain the fact they had not been rewritten in C/C++ earlier?
You are certainly right about R community as a whole moving away from Fortran. The R Consortium even ran a working group project called histoRicalg with the aim of preserving the knowledge on older Fortran routines. The description of the project reads:
Many (most?) of the algorithms making up the numerical building-blocks of scientific software today were developed between 1965 and 1980. Codes were mostly in Fortran, with some Algol and BASIC. Slightly after this period, some were written in Pascal. Today the proportion of workers in scientific and statistical computation who are fluent with these computer languages is small, and the original authors are retired or deceased, with the availibility of original codes becoming precarious as funding for projects such as the NIST Generally Available Mathematical Software (GAMS) disappears.
Some building blocks do however persist until today, the R source code repository remains 23 % Fortran:
Granted, this is only for a few (old) packages, a vendored version of LAPACK, still older LINPACK routines, the loess
routines for locally weighted regression, and the pppack package for splines.
That’s true, of course, but from the point of view of a user, there are “different” Fortrans. The options above only make the compiler to allow nothing but standard Fortran but do not guarantee that every single standard feature is supported.
- Intel’s ifort claims (I guess) full support for F2018 but still, MacOS version does not support
coarrays
so it is not-so-fully-full. Not to mention that the installation of oneAPI is far from easy and that it became free not so long ago. - GNU Fortran support for F2018 is definitely not full. The PDTs are broken. Coarrays are natively supported only in single mode, otherwise the OpenCoarrays are needed. The installation of OpenCoarrays for Linux (I can speak for Ubuntu and RHEL-related distros) is either hard or next to impossible. Fedora had it but abandoned. RHEL and its clones have never had it. Ubuntu had it in 18.04 LTS but since, 20.04 still has the
libopencoarrays
package but lacksopen-coarrays-bin
(incl. caf, cafrun etc.) so the user is left on his/her own to compile and link properly. Luckily Homebrew has a completeopencoarrays
package for MacOS. - Other compilers - I have no experience.
So, especially when one develops a code using most of modern Fortran, getting that code working might pose some problems.
I have now done some digging in glmnet, discovering I have been in error with respect to the wls
and elnet
routines in glmnet which are not 45+ years old. I apologize for this hasty mistake and skewing the tone of the thread.
The line
c mortran 2.0 (version of 7/04/75 mod 7/4/87 (ajc))
is likely the signature produced by the Mortran preprocessor. The initials (ajc)
refer to the author of the preprocessor - A. James Cook - working for SLAC (Stanford Linear Accelerator Center). Mortran 2.0 was released in 1975. One of the glmnet authors, Trevor Hastie, was formerly at Bell Labs, and later at Stanford University.
Until the most recent version of glmnet, it seems the actual work routines were written in the Mortran dialect. Focusing on the wls
routine, this file is the preprocessor input, and this file is the generated output. This is the reason, why the code is hard to read. If the author were to merely update his preprocessor code to the latest standard, it would already be a huge improvement with regard to readability.
Here is an excerpt of the Mortran code:
<k=1,ni; if(iy(k).eq.1) next; if(ju(k).eq.0) next;
if(g(k).gt.tlam*vp(k)) <iy(k)=1; xv(k)=dot_product(v,x(:,k)**2);>
>
And this is the Fortran output:
do 10031 k=1,ni
if(iy(k).eq.1)goto 10031
if(ju(k).eq.0)goto 10031
if(g(k) .le. tlam*vp(k))goto 10051
iy(k)=1
xv(k)=dot_product(v,x(:,k)**2)
10051 continue
10031 continue
The next
keyword in Mortran is available today with cycle
:
do k = 1, ni
if (iy(k) == 1) cycle
if (ju(k) == 0) cycle
if (g(k) > tlam*vp(k)) then
iy(k) = 1
xv(k) = dot_product(v,x(:,k)**2)
end if
end do
In fact, the entire wls
source code in Mortran, can be rewritten very nicely using the new features of Fortran (90?) like cycle
, exit
, and named do
loops, without going through the trouble of deciphering the goto’s.
Now concerning the interface with R itself, I’ve inspected the glmnet routines, and they use the .Fortran
interface which creates a deep copy (DUP) of matrix and vector arguments. Per the answer from Twitter user b_naras in the thread linked by @Beliavsky, it seems the speed-up is coming in part from the tight binding via the .Call
method generated automatically through the Rcpp
package.
That said, the C++ code has other improvements for readability, like using iterators, and the C++ STL. The sparse regression routines make use of the Eigen sparse matrix library, which are probably much more convenient than the raw Fortran methods.
It would be a nice project to port glmnet to Fortran entirely and see how close one can get in interface friendliness with respect to the wrappers, like:
- python-glmnet, and
- GLMNet.jl
Julia developers have also made a pure Julia implementation in Lasso.jl with allegedly better-performance than the Fortran version.
It would be ideal if the Mortran preprocessor itself could be rewritten to output modern Fortran from Mortran. Some explanations of it are here and here. The Mortran preprocessor can be compiled with
gfortran -std=legacy egs_fdate_stub.f mortran3.f
Here is an R package to facilitate calling Julia from R. The manual describes how the translation from R to Julia and the reverse is done. Ideally calling modern Fortran would be similarly simple.
JuliaConnectoR: A Functionally Oriented Interface for Integrating ‘Julia’ with R
Allows to import functions and whole packages from ‘Julia’ in R. Imported ‘Julia’ functions can directly be called as R functions. Data structures can be translated between ‘Julia’ and R.
It’s true that only a small fraction of new R packages use Fortran, and that many of the R packages that do have old-fashioned FORTRAN. OTOH, I see the R package
BayesFM: Bayesian Inference for Factor Modeling
Collection of procedures to perform Bayesian analysis on a variety of factor models. Currently, it includes: “Bayesian Exploratory Factor Analysis” (befa) from G. Conti, S. Frühwirth-Schnatter, J.J. Heckman, R. Piatek (2014) <doi:10.1016/j.jeconom.2014.06.008>, an approach to dedicated factor analysis with stochastic search on the structure of the factor loading matrix. The number of latent factors, as well as the allocation of the manifest variables to the factors, are not fixed a priori but determined during MCMC sampling.
which is currently maintained and uses modern Fortran, including modules and type-bound procedures. The system requirements say gfortran (>= 4.6.3)
, although binaries are distributed, so end users need not deal with Fortran directly.
I have used the dotCall64 package to call Fortran codes from R for a while. Based on my personal experience, I believe if glmnet uses the dotCall64 package to call Fortran codes, that would be as fast as using Rcpp to call C++ codes. The key reason for glmnet’s migration to C++ may still be that the legacy F77 codes are simply impossible to maintain.
I think that compared to adding more features to Fortran, getting new users to use the existing modern Fortran features is at least as important.