I recently helped to set up a webinar on “Fortran for High Performances computing”., (Wadud Miah, 2020, May 4th. If needed, see the event page and recording here : EXCELLERAT Services)
The audience (around 100 people) asked many questions, which IMHO should a least be listed in the Fortran Discourse, to be discussed or answered. And If enough answers are gathered, this discussion could become a good reference.
So feel free to comment…
A - General questions:
A1 - There is a cost at modernizing code and often it is not considered worth the investment. How can this be addressed ?
A2 - Are you aware of any NEW HPC softwares the development of which started with Fortran ?
A3 - What about newer C++ features (constexpr, auto, modules) and the ISO_C_BINDING? Of course the Fortran binding is for C, but it is often interchangeably used and I wonder if there are plans to support CPP. Essentially at the moment using ISO_C_BINDING seems to restrict both the Fortran features and C(++) features which are usable. How do you see this being resolved?
A4 - More comment than question, I would personally track the standard compliance of the compiler used instead of the language standard
A5 - Are you aware of any efforts to build a C++ container interface to Fortran arrays using whats available in the “ISO_Fortran_binding.h” header? This would be great to enhance interoperability with C++.
B - On parallelization
B1 - “Which parallelization methods are generally preferred? OpenMP or Fortran coarrays?”
B2 - What are The advantages of co-arrays over OpenMP
B3 - I’m a little confused about the conceptual space shared by coarrays and MPI. Moving forward, what will the standard focus on? Can we expect “CUDA aware coarrays”? Or should we focus on MPI to the exclusion of coarrays?
B4 - Hardware topologies are getting more and more layered and deep (e.g. AMD epyc processors) with increased non uniformity in memory access. How are “placements” managed with coarrays? It’s the counterpart of MPI process pinning and OpenMP threads affinity.
B5 - can coarrays be used across multiple devices? cores? nodes? gpu? what are the restrictions?
B6- are aware of any performance comparison in real codes between coarray fortran and other approaches ?
C - On GPU Programming
C1 -Is CUDA Fortran is supported by the PGI compiler
C2 - if cuda openmp does not share memory, is there any advantage with respect to mpi cuda?
C3 - Can you please give some more hints on how host/device memory copies can be explicitly handled with OpenMP for GPU’s so that memory residency can be exploited. Thanks
C4 - what about GPU computing and solutions for data race issue ?
C5 - Are you aware of third-party libraries such as ArrayFire for high-level GPU programming? Could you share your feelings on choosing it instead of the standard options you presented (CUDA, OpenACC, or OpenMP)?
Thanks for the good questions, and welcome to the forum. Here are partial answers to a few of them. Others can comment with more specificity.
A1 - Fortran is backwards compatible, so that code an be modernized piecemeal. A March 2020 GitHub thread discussed some available tools for modernization, also listed at the Fortran Wiki.
I would like to have a better answer for A2. The link @Beliavsky sent is very good, but it mostly shows projects that already exist. What are the new HPC project started in 2021?
I found three that started / went open source in 2020:
CREST: created on GitHub in Apr. 2020, first commit in Apr. 2020
Conformer-rotamer ensemble search tool
SNaC: created on GitHub in May 2020, first commit in May 2020
A multi-block solver for massively parallel direct numerical simulations (DNS) of fluid flows
Tracmass: created on GitHub in Sep. 2020, first commit in Dec. 2019
Lagrangian particle tracking code
I found those by querying the GitHub API for each project on the package index under scientific for the creation date, then looked up the actual first commit.
One issue with A2 is what “new” and “HPC” means. HPC projects are different than other software in few factors:
They often solve a scientific or engineering problem, so the pace of software development is to an extent tied to the scientific understanding of the problem. Development time scales are often 5-10 years or longer.
They are often large frameworks or applications, not small libraries
They are often initially closed source, and open sources years later or when stable
What’s HPC software? Is it any numerical parallel software, or does it qualify only if it runs on HPC systems?
Some of it is not on GitHub, so more difficult to discover.
Development often tied to governmental funding cycles, which are typically 3-5 years.
Because of all the above, new development in similar or adjacent areas is more likely to be a new feature or capability in an existing mature project, rather than a new project altogether.
Looking at 2021 is highly restrictive, for any software, not just Fortran HPC software.
For HPC software I would consider “new” anything started in the last 5 years. For Fortran software, 10 years may be reasonable.
Looking at new projects is important, but also looking at older projects that are under active development (i.e. new features rather than just bug fixes) may be just as relevant.
Very nice! Would somebody be able to write a script to extract this information from GitHub automatically? I would like to see this as a function of new projects in Fortran per month in time.
But it is one metric that we can automate and github has a large user base. I think we should be able to extract some signal out of it, as imperfect as it is.
For large projects that fly “under the GitHub search radar” but are nevertheless important to count, I wonder if we could crowdsource the data about large Fortran projects throughout history (start year, stable/production year, development stopped year), and do a 5- or 10-year rolling average. Then we’d see how that curve changed through the decades. It seems like it would be a lot of work to collect and organize the data, but perhaps feasible.
Yes, I want to see this, if it’s technically doable. In some sense we already started in Packages — Fortran Programming Language, but those are only open source. I would like to include all other codes too, but as previously discussed, it’s tough to ensure that they are actually (still?) written in Fortran, as well as the scope and other things. So that’s why we started with open source, as that’s easy to verify.
So we should do the automatic github statistics somehow. Then we can go from there.
We could have a short look at this Github survey on our side. As I understand this
we stick to github first
projects that are involving “enough fortran” (if this can be observed)
projects that are active enough
projects that are large enough
It would also be interesting, IMHO, to limit to one “Fortran hot field” such as CFD, and compare to a C++ counterpart.
Tell me if you foresee some more biases or interesting insights.
Plenty of caveat, but I let you know if we succeed.
…and I hope there will be more comments on HPC questions (B and C section)
Regarding the “enough Fortran” question, GitHub lists the fraction of code in a project by language in the Languages section. There are some projects where active development is not done in Fortran but a large fraction of the code in the GitHub repo is Fortran, because the repo includes the source code of Lapack or some other classic Fortran 77 library. To exclude such code, which usually has a .f extension, one can download the repo and look at the fraction of code with .f90 and also .f95, .f03, .f08, and .f18 extensions. The .f90 suffix should be used for all free source form code, but this practice is not universal.
A coarray itself is merely a representation of a distributed array; Even with a scalar variable, if we declare it as a coarray we add a (distributed) dimension to it.
Coarray Fortran, on the other hand, is a full featured parallel programming language tailored for exascale: The coarray run-time is set up underneath of an object-oriented array-programming language. Thus, Coarray Fortran does natively support distributed objects at an advanced level. If one wishes to compare the language, I would suggest to use the X10 or Chapel programming languages for comparison purposes.
One could share two different views to the Coarray Fortran programming language:
Low-level parallel programming using distributed arrays (coarrays): To implement some required features that are missing in the base language. Some of such features are described or natively implemented with the the X10 or Chapel programming languages. (Examples are Fault Tolerant Execution in X10 or Data-Centric Synchronizations in Chapel). The interesting question here is, who can do a better job, the implementer (X10, Chapel) or the programmer (Coarray Fortran)?
Higher-level parallel programming using distributed objects: Coarray Fortran does natively support distributed objects through it’s OOP syntax. We can easily substitute coarrays (i.e. distributed arrays) by distributed objects. Distributed objects allow for an higher level of abstraction and are certainly an important foundation for exascale programming.
The shift from distributed arrays (coarrays) to distributed objects for parallel programming is somewhat similar to the shift from procedural programming to object-oriented programming in serial programming.
I am currently preparing a paper and example Coarray Fortran codes to describe higher-level parallel programming using distributed objects in more detail.
I am aware of two projects which are somewhat able to close the gap between C++ and Fortran: shroud and swig-fortran (also see this presentation or the arXiv paper). The book Scientific Software Design by Damian @rouson contains a great description of the challenges involved in interfacing C++ and Fortran. Another approach is transpilation like in the upcoming LFortran compiler, but I guess this will only be an option for C++ codes calling Fortran subprograms.
Edit: I just remembered we had a thread open for Automatic Fortran to C++ conversion with a program called fable. The other way round (C++ to Fortran) is more difficult due to the complexity of the C++ type system.
This was my question. Some attempts at this can be found in the stdlib issue #325 originally opened in response to my Discourse thread. I’d be extremely happy if any skilled C++ developers could comment or offer their help in the stdlib issue.
ArrayFire has a unified C/C++ API. It seems feasible to build a Fortran interface on top of this. In most applications you will only need a small subset of all the routines so it could be a viable option. Still it would be nice to have an fpm-compatible ArrayFire package.
Hi Michael, I thought that Coarray Fortran referred to the extension evolved from F–, so I often correct people to call it just Fortran, with coarrays, teams, events, and collectives being the features of the language. Similar to how parts of High Performance Fortran became part of Fortran 95. Maybe I was wrong in doing that but that was my understanding. @billlong , @rouson, and yourself will know more about it.
Hi Milan, yes it’s a little bit confusing because I am using (my own) different naming rules for the language and I do divide the (single) Fortran language into two distinct programming languages because of practical reasons: Coarray Fortran for programming against the coarray run-time, Fortran classic for programming against the classical serial run-time, and Fortran for a combined use of the both. (The later because a Coarray Fortran parallel program requires also serial programming for the purely local computations, which follows the rules of the classical serial run-time).
Treating Coarray Fortran and Fortran classic as distinct programming languages felt necessary to me after starting using OOP techniques and syntax to define and implement parallel models in Coarray Fortran: Fortran classic and Coarray Fortran now share the same OOP syntax, but in Coarray Fortran it’s Parallel Oriented Programming (POP) for defining and implementing parallel models; Strictly said, this has nothing to do with OOP, the codes do only look the same, sharing the same syntax. For me it felt necessary to make a sharp distinction between Coarray Fortran and Fortran classic because the usual use-case is a combined use of the both languages (in a single programming project), where the same syntax will have completely different meanings. Treating the same syntax with completely different meanings as a single Fortran language (inside a single programming project) could lead to major confusions, IMO. From an outside view it’s totally ok to call it a single Fortran language. As we all now, the definition and description of the Fortran language is also as a single one.
@Federchen , please don’t take this wrongly - it’s more than a “little bit confusing”, there is the risk of it being considerably misleading if other readers start taking your terms such as “Coarray Fortran” and “Fortran classic” seriously.
Serial and parallel aspects are parts of the same single standard for Fortran and that is a significant positive for Fortran leading toward legitimate claims Fortran is a multiparadigm and multipurpose language with native support for parallel and concurrent programming paradigms even as its long and rich legacy and its impression among the global computing Community is with procedural and serial applications.
It’s far better for Fortran the public discourse be about the Fortran its standard represents.
I have never used ArrayFire Fortran, but I experimented a bit on some toy projects with the C++ version. If I recall it was a positve experience, and indeed having it as an fpm-compatible package it could be a game changing feature.