Glmnet migrates to C++

glmnet is the most famous R package that heavily uses Fortran codes. In its most recent update (today), it has migrated to C++.
https://cran.r-project.org/web/packages/glmnet/news/news.html

R gave up modernizing its Fortran interface a long time ago. People no longer use Fortran to extend R. This battlefield has been lost.

2 Likes

Looking at the code (CRAN - Package glmnet) they still have some Fortran code. It is f77, fixed form with implicit typing, go to statements, no indentation. Very hard to read.

There are tools that can convert such code to modern Fortran, but I don’t know how reliable they are.

Given the familiarity of new developers with C++, it makes perfect sense to migrate. The tooling is just so much better. If you want, why don’t you reach out to the developers and ask them for reasons to migrate?

I still like Fortran, I think it’s not a bad language. And I am hoping to help improve the tooling so that people can stay in Fortran.

6 Likes

The C++ routines are much faster – I wonder why.

Some of the Fortran in glmnet has been replaced by C++, written by the newest member of our team, James Yang. * the wls routines (dense and sparse), that are the engines under the glmnet.path function when we use programmable families, are now written in C++, and lead to speedups of around 8x. * the family of elnet routines (sparse/dense, covariance/naive) for glmnet(...,family="gaussian") are all in C++, and lead to speedups around 4x.

There is no battlefield. Because there is no battle. There has never been a battle. Battle for what? Between who? Every time I see such discussions and arguments, I remind myself to be objective and ask myself why I am using Fortran. I am using it because it solves my research needs. It is just a (nice) tool in my toolbox along with other tools (Python, R, MATLAB, Mathematica, Julia …). It is neither my parents nor my family member or a special friend of mine to grow an endless love for or to fight for. If the benefits of using some other tool outweigh the use of Fortran at some point in my research, I would be silly to not abandon Fortran to use the other.

If others find other languages more comfortable to use and develop, let them use them. Maybe they are more productive in those environments than in Fortran. But based on my almost 2-decades-long experience, this is frequently incorrect and possibly only due to their lack of knowledge of modern Fortran or more importantly, misinformation about Fortran, sometimes intentionally shared by some people for various reasons like profit or gaining popularity at the cost of spreading falsehood. And this is the only case that makes me furious and gives me the feeling that I am in a battle, the battle against misinformation, not the battle between Fortran and some other X language.

I have vigorously defended each of the programming languages that I know, like Python, MATLAB, R, Julia, to the extent that I know, in various forums over the past years. Not because I have a special love for any one of them, but to fight misinformation, whether it is about Fortran, Python, Julia, or some hardly known language. As long as I am alive and teaching, I won’t leave the battlefield of misinformation.

Now, if Fortran programmers (like me) want to see more people using Fortran for scientific research, they need to train more people in Fortran and write/develop more in Fortran. I am sure if the glmnet team had some members who knew modern Fortran and who were willing to rewrite the old code in modern Fortran, the team would have not opposed it (They might have had doubts, but I seriously doubt if they would have opposed rewriting it in modern Fortran). So the first programmer who volunteered to modernize the codebase happened to be a person who knows C++ and therefore, they rewrote everything in C++.

If you ask a random programmer in the street about what compiled language they know and use, what are the odds of them responding to you “C++” as opposed to “Fortran”? Almost 100%. Given this information, what does this rewriting of glmnet from F77 to C++ tell us about Fortran? Nothing, except that modern Fortran developers, are indeed scarce. How can this be resolved? By training more people and younger programmers in Fortran.

Why did Intel make its compilers freely available to all students and teachers several years ago? Because they knew these people grow up and go to companies all across the world, asking their bosses to buy them Intel compilers on Intel architecture, because that is what know and that is what they are addicted to.

Why is the Julia discourse forum flooded with undergraduate and graduate students? Because, the for-profit company behind Julia knows the secret to winning the market and making profit is to train a massive generation of Julia programmers who will fight to their deaths for Julia and are ready to sacrifice the fundamental Principles of Science to remain addicted to their beloved programming language.

I very much like the style of the current spiritual leader (from my perspective) of the Fortran programming language, Steve Lionel, who always keeps objectivity in his discussion, focusing on what really matters to Fortran and what the goals are, and nothing else, and I believe all Fortran programmers should follow that spirit of objectivity in their discussions and work.

20 Likes

You might also look at this from the other side. The wls.f code appears to be written in 1975, roughly the same time as the S programming language which was the predecessor of R. The fact Fortran served S and R heavily for more than 55 years without any other contestants is quite an achievement if you ask me.

It’s also worth noting that S (and hence R) was originally built around Fortran; quoting John Chambers in “S, R, and Data Science” (2020):

The approach that succeeded in satisfying all the requirements was to build the system around an interface to Fortran. From the start of the project, our design was based on writing specialized code to incorporate individual Fortran subroutines into S by writing a specialized interface function for each of them

The C++ integration with R is very good today due to the Rcpp package created by Dirk Eddelbeutel. This package relies upon template meta-programming to create a tight interface between the C++ code and the underlying opaque C structs of the R runtime. Industry has also been investing heavily in C++ compilers. If you had to modernize these routines, porting them to C++ would make a lot of sense.

In fact, in his book Extending R (2016), John Chambers explains that the interface we see today in Rcpp, is very similar to the initial S interface language:

The interface language looked a little like S but actually was parsed and transformed into an extended version of Fortran. […]
However, the interface language had important advantages, worth considering today in dealing with challenging computations where efficiency matters.

  1. Because the interface language did end up as Fortran, it could produce computationally efficient code with direct access to much existing software for serious numerical methods.
  2. At the same time, having a customized language allowed the programming to look more like S (and with less primitive tools, one could have gone much farther in this direction).
  3. Some non-S programming was helpful for efficiency or clarity (such as type declarations for arguments) and could have been useful independently of having a particular subroutine to call.

These features remain relevant today and have in fact resurfaced in some current approaches to interfaces from R. The closest general analogue is the Rcpp interface to subroutines through C++ […].

One could argue point 3. is fulfilled today much better in Fortran by the various new attributes, e.g. intent.

I think something close can be achieved for Fortran, but it would require a pioneer like Dirk to spearhead the development of a new Fortran-R binding, and prototype it in LFortran. There a few reasons why such a project might succeed:

  • R and Fortran both use 1-based indexing
  • R and Fortran both support array operations
  • Fortran is a simpler language than C++, which makes it easier to learn for statisticians, scientists, and engineers

The reason Fortran has fallen out of favor today is the manual drudgery of wrapping Fortran code via the various interfaces, and also the lack of general developer tools expected. There are other (social) aspects I suppose too, like the R core team simply favoring C. I am not sure how well Fortran syntax highlighting works in tools like RStudio.

Lastly, I would point again to the linguistic relativity principle. While losing a Fortran package to C++ might seem bad, it is because you have taken a Fortran-centric stance. Again quoting Chambers:

In implementing extensions to R, the INTERFACE principle says we should look widely to find good computational techniques to achieve our goals. If an effective solution has been implemented in a form other than R code, providing an interface from R may be the best approach.

As a scientist using R to solve problems, I would be happy someone improved the code. To conclude here are two quotes building upon the battlefield analogy raised in previous posts:

“There is no instance of a nation benefitting from prolonged warfare.”
- Sun Tzu, The Art of War

“If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.”
- Sun Tzu, The Art of War

8 Likes

On Twitter, one of the developers answered,

From copying: the DUP argument is always true for .FORTRAN. The C++ code works directly with the R structures.

The R package dotCall64 can avoid copying, so this argument of copying is not 100% valid. I guess a more plausible reason is that Fortran programmers are really scarce nowadays. We are probably the last generation of Fortran users.

1 Like

We’ll that’s depressing. It doesn’t have to be that way. But it could happen if the Fortran Committee fritters away another decade on minor tweaks that do nothing to get a single new person excited about writing Fortran code.

I have a simple question, if Fortran is really dying, why Intel, Nvidia, IBM, etc continue to work on new versions of Fortran compiler?

1 Like

Well IBM will also sell you COBOL but that doesn’t mean that’s a vibrant community that any young person is excited to be a part of. :joy:

3 Likes

It is not dying. It reached its community saturation level, long time ago, perhaps in the 90s or 80s, and has remained there, while generic languages like C/C++/Python have a lot more room to grow their communities. Therefore, they seem to be (and they are, but only proportionally) much more popular than Fortran. After all, what percentage of humanity is scientist or engineer? And what percentage of that tiny community works in or needs HPC? That tiny, tiny HPC community is now also shared with a dozen other languages besides Fortran.

The key to further popularizing Fortran is to stop reiterating its weaknesses and do something to fix them, for example, by

  1. helping with open-source compiler development.
  2. helping the existing or starting new library developments (Example: the majority of the old R-F77 projects need migration to modern Fortran. If no one volunteers, some C++ developer volunteers will soon migrate them all to C++).
  3. Correcting misinformation (here specifically about Fortran) wherever it occurs.
  4. Advertizing and teaching Fortran to others, especially the younger generation. The latest online Fortran course I visited today, also advertised on this forum, has ~300 student enrollment. That is excellent enrollment for a Fortran course, IMO, much more popular than I expected.

From my viewpoint, we need to stop saying “xxx is dying”, “xxx is losing”, “yyy is winning”. The Fortran community has so far done an impressive job of identifying the existing weaknesses of the language. It is now time for actions, exclusively actions, to resolve the shortcomings one by one.

12 Likes

I think that is part of it, but there is more. I think the main issue is that Fortran has lost its vision and the willingness to execute it.

In particular, Fortran has tons of room to grow its community. For scientific computing the language allows anything that Python can do, and more. But Fortran is currently not usable as Python. But it should be.

My vision for Fortran is to be the go to language for numerical scientific computing. It needs to be able to be interactive (as well as regular compilation), fast compilation, the best performance, it needs to run everywhere, in the browser, on every GPU, CPU and any other chip. It needs to have rich ecosystem of libraries, like Python. The best package manager. New people must be coming all the time to learn Fortran, as they are coming to Python and Julia. New projects are being started all the time. Many C++ projects are migrating to Fortran, because it is a better tool. That is my vision.

If you want this vision to succeed, then there are many ways you can help. You can help us with the LFortran compiler development. You can help with the Fortran Package Manager (fpm). You can help writing tutorials, reporting bugs. You can write open source Fortran libraries, compatible with fpm. If you don’t know how to help, feel free to contact me, I am happy to do a video call to brainstorm how you can help.

12 Likes

@shahmoradi , nice write-up. Indeed there is no battle. Also, absolutely no battle of any kind is envisioned in the functioning of Fortran standard development workflow with the few “national bodies” that contribute to the work. The functioning is almost as if Fortran for the sake of itself and which is almost separated from the concepts of time and space! This can be seen as quite “spiritual” indeed. But there is a problem though, there is no clear vision, almost no purpose, and from direct experience, please note there is little to no objectivity whatsoever.

The end result is that practically every point in time, especially since ANSI X3.9 1966 revision, Fortran falls much further short than what it can be in terms of its semantics, syntax, and services (its features, etc.) when it comes to the full realization of computing needs in any science and/or engineering endeavor. This is just not right by those scientists and engineers, it’s tremendous disservice to them. The worst part is this is even more true today that before and no one at the helm is showing much of any leadership nor objectivity.

The way language enhancements work is like this, it entirely depends on who makes the requests and how, otherwise it’s “no enhancements for you”. Then when the domain of scientific and technical computing notices many a codebase (Glmnet) migrate away to other options, note the incoherence, inconsistency, and inadequacy of Fortran language evolution too have played a crucial role in the defection. To not acknowledge this reality and to do little to nothing to enhance the development workflow but rather to always deny, deflect, and delay is the very antithesis of being objective.

To give you a simple but what I consider to be absolutely relevant to modern scientific and technical computing is enum types. Such computing inevitably has to deal with magic numbers and few computing solutions now hesitate to offer rather extensive type-safe facilities to their practitioners to deal with such numbers via enums. You will know what is viable with C++, C#, D, Julia, MATLAB, Python, Rust, Swift, Visual Basic, and so forth. Fortran added something nominal but mostly type unsafe in 2003 revision. It’s trying again with Fortran 202X. Lo and behold then, Fortran gives not one but two facilities under the charter of “user-defined types” but which would appear foreign to those used to the other one, derived types. So on the anvil is stuff such as the following:

enum, bind(C) :: flags
    enumerator :: f1 = -1, f2, f3
end enum

and also

enumeration type :: flags
   enumerator :: f1, f2, f3 !<-- enumerator members CANNOT have user-defined values!!
end enumeration type

when neither of which offer much of any facility beyond C circa Kernighan-Ritchie late 1970s and Pascal 1980!

Imagine a language as resource-constrained as Fortran brings in not one but two different syntactical forms of a facility with different semantics that are both deficient for modern needs. How insane is that!? Wait until the confusion for the practitioners!

This is being asleep at the wheel and dreaming of 1960 thru the 1980s to serve as platforms for mid to late 21st century needs. Nothing objective here.

2 Likes

I understand that Fortran requires many new features to cater to modern scientific applications. But a different view towards the same situation can be that - “It’s the Fortran paradigm”. There is a paradigm for every programming language. Ofcourse C++ is multiparadigm. But within each paradigm it has it’s own set of best practices and so on. Julia’s paradigm is multiple dispatch. Java is Object oriented and some other languages - it’s functional… Similarly why not bring out Fortran’s own paradigm, rather than keep on running after facilities of some other language and trying to catch the bus.
In the world of Object oriented programming, it’s a bold decision for Julia to not even introduce classes - but they have chosen one paradigm and repeatedly kept on applying that to different use cases and created libraries showing it’s efficiency. C is a small language - but they created libraries with that. At that time people would have definitely though - “What this language doesn’t have simple way to even input and output !” But they used the basic facilities of the language to build “stdio.h” and created easy I/O. They didn’t run after the facilities offered by other languages. I think by that time FORTRAN was doing much more. This is one kind - “Put the basic facilities in the language and create complex libraries”.
Another approach is Python. It’s happily gallops C/C++ libraries and gives a friendly interface. It doesn’t claim - I am the supreme, it’s happy to do what it is good at.
So, either we create robust libraries with the basic language facilities and show how to use them or happily gallop other libraries that have the advanced features. When we don’t have a very active community like Julia to write everything from scratch, it’s better to invest our energy in making use of old libraries from Gams and Netlib and provide interfaces to them. Even stdlib also, I don’t think we have to write it from scratch with grand OO facilities trying to imitate generics. When we take so much pride in legacy of Fortran, it’s actually an insult not to use them and provide friendly interfaces to them. Later on we can incrementally increase the facilities. When workforce is more we can invest in rewriting. Of course, for new data structures we have to write from scratch, but remaining - Linear algebra, Differential equations, statistics, … we need not write from scratch !
Recently certik has posted Zen of Fortran - let us follow that. We should have individuality and those coming to Fortran should like to think in a Fortran way - and Fortran saves a lot of space in mind - there is a lot in life besides programming !
There are much more visionary and intelligent people here, I am just a beginner - but these are my views.

2 Likes

The keynote address of the FortranCon 2021 was Fortran at the Intersection:
Synergies Arising from the Interplay Between Paradigms
by Damian Rouson

Description

Although Fortran has evolved into a modern, multi-paradigm programming language, the research literature on Fortran more thoroughly addresses some paradigms, such as object-oriented and parallel programming, than others, such as functional programming and programming by contract. This talk will present new patterns for expressing concepts from these less-studied paradigms and will illuminate the subtle interplay between each of the aforementioned paradigms. The talk will demonstrate how error termination, a parallel execution concept, combines with object-orientated programming (OOP) to facilitate contract enforcement in pure procedures, a functional programming concept. The talk will highlight how user-defined operator semantics nudge the programmer toward writing purely functional expressions suitable for asynchronous parallelization, vectorization, or offloading to accelerators. The talk will also describe how OOP supports asynchrony through the event_type derived type. The talk will conclude with thoughts on intersectionality from a social science perspective, describing the experience of someone from an underdog community striving to teach an underdog language new tricks.

Fortran will not be a purely functional language, but it should be improved to reduce the use of mutable variables, as has been proposed.

1 Like

@Ashok yes, I agree with pretty much everything you wrote. @shahmoradi, @FortranFan and @fortran4r thanks for your posts too, I agree also. In fact we are all in agreement here.

The situation might seem hopeless, but it’s not. Each of us can change this for the better in our little ways. Look what we’ve achieved already in just 1 or 2 years. @fortran4r, @Ashok, I encourage you to join our monthly meetings and to contribute into many of the efforts we have, just pick what bothers you the most. If you don’t know what to do, please feel free to contact me anytime, I am happy to help.

4 Likes

Thanks ! I am a graduate student in Structural Engineering. I don’t have computer science background. I can help in stdlib. But I am working in windows. For me to setup stdlib with all fypp is a nightmare. So I am now trying the fpm route - just bare modules without any generics. I believe many Fortran users are on windows. So, I can help in making Fortran more useable in windows ecosystem. Let me know anything there - Linear algebra, differential equations, statistics …
I appreciate your efforts on LFortran. I want to contribute there but I don’t have any knowledge of compilers and so on. There is a dire need for one open source compiler for Fortran 2018. Whenever I see somebody asking a question and people responding “I did’nt get this with gfortran”, “I got another result with ifort”, “I got different result with nvfortran” - I feel so sad that - what is proper Fortran anyway ! @certik , I request that for now you please mainly focus on getting the compiler ready for Fortran 2018 as fast as possible. Once that is ready we will have a reference of what is “Fortran”. Till then we have to keep speaking “which Fortran”. Then the whole community can build on that - stdlib or whatever. I wish you could have taken gfortran and developed it further - but your goals are different. But I really mean - we need a Fortran 2018 open source, free compiler as fast as possible.

2 Likes

Fortran has a standard to answer this question and is not defined by the behavior of any one compiler, and compilers have standard-conformant modes, for example

gfortran -std=f2018
ifort -stand:f18

The existence of multiple compilers, some open source and some commercial, is a strength, not a weakness. If you use the random_number intrinsic you won’t get exactly the same results across compilers, but ideally the qualitative results of simulations should be robust to the choice of RNG.

Regarding LFortran, I regularly test it with with any new code that I see or write and file an issue if it has not already been reported .

3 Likes

We are getting there with F2018 support in LFortran. GFortran is pretty good with the F2018 support in my experience. So is Intel Fortran. And NAG.

You can greatly help us on Windows. Indeed there are tons of users (if not the majority) on Windows.

If you are interested, why don’t you join our next Fortran call. We need to get the fypp working on Windows so that you can contribute to stdlib. How do you install Python packages on Windows?

2 Likes

pip install - …

1 Like