Fortran applications using Fortran 2008+ features

Are there popular Fortran applications using Fortran 2008 and above features? Particularly features like do concurrent, co-arrays and new intrinsics? If co-arrays are used, it will be helpful to know whether there is an MPI version of the application also available.

A quick GitHub search for “do concurrent” and Fortran language found only two applications (neither is popular). It also found 7k code hits, but it appears all of them are either in the flang or gcc compiler test suites.

Edit: one of the two hits, is the CloverLeaf hydrodynamics benchmark, which is available in an MPI version.

The neural-fortran framework of @milancurcic uses co-arrays and some of the intrinsic collective procedures. I don’t think an MPI version is available.

2 Likes

The Articles section of the Fortran Wiki lists

and there are many more papers listed at the OpenCoarrays site. Today @themos mentioned qr_mumps in a different thread.

2 Likes

No coarrays in neural-fortran, only co_sum and co_broadcast. :slightly_smiling_face:

2 Likes

Intermediate Complexity Atmospheric Research (ICAR) model uses coarrays. I don’t know how popular it is, but it’s been published and is actively used.

@rouson will know of more examples.

2 Likes

Since there are only two compilers I know of (Intel and Cray) that
have anything close to a full implementation of F2008 (and doesn’t
gag with an ICE on just about every code you try to compile with
them), I think finding many apps that use F2008 or F2018 features
is an exercise in futility. Its a “chicken and egg” thing. Compiler writers
wonder why no one is using these features and fail to acknowledge its
because they have done a poor job of 1) implementing them and 2) getting
them into commonly used compilers (in bug free forms) in a timely
manner so people can actually try to use them to see if they are better than
existing methods

Just my 2 cents

1 Like

Hi @kiranchandramohan ,

You may want to think of your inquiry the other way around!!

Should various compiler implementations, especially with ARM’s lead with LLVM FLANG, etc. provide full-featured and robust and optimized support of Fortran 2018 now (or in the very near future) particularly toward COARRAYs with TEAMs and DO CONCURRENT (notwithstanding the concerns some (Nvidia members) have with it), imagine how many powerful (and perhaps popular) applications will be aiding the advancement of computational science?

Consider the cliched phrase in this clip https://youtu.be/o3c_pJ_CLJQ.

Fortran is at that crucial fork in the road now, it’s back to the future in a field of dreams just like 1954 and Backus and co.: if language standard-bearers and compiler implementations build it right, they will come!!

1 Like

A quick correction there. Arm is part of the F18 project (llvm/flang) and we have helped upstream it and work on the driver, OpenMP portions etc. The core-compiler is developed by Nvidia Engineers (@pmk and team) with some contributions from others.

F18 has parsing support and a subset of checks for co-arrays. I believe there are plans for runtime but I am not sure of the timeline. Since there is some support for OpenMP parallel and worksharing constructs in F18 we are in a position to convert do-concurrent to these OpenMP constructs as one option, other options in future could be offloading to gpu or vectorisation or loop transformations.

This question was primarily in the context of Classic Flang. Classic Flang has sequential support for do-concurrent, but no support for co-arrays and no support for many Fortran 2008 intrinsics. I am trying to gauge what we are missing out while waiting for F18 to reach production.

I understand and agree with the chicken and egg problem. Hope was that with Intel and Cray supporting F2008 and gfortran having partial support the three compiler problem is solved for application developers. But we are still seeing not many applications with Fortran 2008.

It will be easier to convince management if there are crucial applications which will not run on our hardware without F2008 support. At the moment fixing customer issues, helping maintain Classic Flang and contributing to F18 takes up most of our time.

I am aware of your concerns (DO CONCURRENT isn’t necessarily concurrent — The Flang Compiler) and will only consider it with one of the options suggested in the What to do now section.

1 Like

I do not have a popularity measure. But there is the ParaMonte Monte Carlo sampling library that heavily relies on Fortran 2008 and Fortran 2018. All of the major algorithms are implemented for both Coarray and MPI. A scaling comparison of the Coarray and MPI implementations is given in Figure 8 of this manuscript. However, keep in mind that this comparison is based on the two-sided communications that are implemented in the library. A more fair comparison between the two would have to contain RMA communications instead of two-sided because that is where Coarray (supposedly) truly shines.

I do not think if the search results are accurate. This library heavily uses do concurrent where appropriate, but it does not show up in the search.

1 Like

On a side note, the do concurrent construct can now offload to GPUs on NVIDIA/PGI Fortran compiler:

2 Likes

It is likely hidden among the 7200 code results. I only checked a few random pages of results, but could only see forks of the flang and gcc projects.

Upon combing the first 48 (of 100) pages I found a few more libraries which use do concurrent:

3 Likes

First, most people use “coarrays” as a shorthand for “parallel features in Fortran.” I’m interpreting the original post in this light and want to point out that Fortran’s parallelism now has a large set of parallel features that do not require coarrays and I find that the non-coarray features cover all the parallel algorithmic needs of a surprisingly large percentage of applications – possibly even the majority of applications that I encounter in projects. These features include

  • Image enumeration (this_image(), num_images())
  • Synchronization: sync all, sync images
  • Error termination: error stop (including error stop in pure procedures with variable stop codes)
  • Collective subroutines: co_sum, co_max, co_min, co_broadcast, co_reduce
  • Teams: form team, end team, change teams, sync team, team_type
  • Failed images: failed_images(), fail image, STAT_FAILED_IMAGE

In fact, the features that require coarrays are now a small minority of the parallel features:

  • Coarrays
  • Events: event_type, event_query(), event post, event wait
  • Atomic subroutines
7 Likes

Second, there are now at least four very usable, even if not always complete, compilers under active development that support the parallel features of Fortran 2018:

  • Intel OneAPI: fully Fortran 2018 compliant
  • Cray: I don’t know the status, but I suspect Cray supports all above features except failed images
  • GNU/OpenCoarrays: all of the above features have at least partial support
  • NAG: they support all above features except collective subroutines and failed images

Additionally, there are

  • flang: parses the syntax of the above features, but I don’t think it’s fully linked to the LLVM back end yet so flang unpauses the intermediate representation and passes the code to gfortran
  • g95: no longer under active development but supports coarrays

So I really don’t think the situation is anywhere near as dire as most people assume, but the Fortran world has become so used to compilers falling behind and we have an embarrassment of riches in terms of the number of compilers under active development so it’s hard to keep up with what they are all supporting.

7 Likes

Third, the literature contains great examples of coarrays running in important applications at scale. Here are three examples dating as far back as a decade ago:

Speedup of 33% relative to MPI-2 on 80K cores for European weather model:

Mozdzynski, G., Hamrud, M., & Wedi, N. (2015). A Partitioned Global Address Space implementation of the European Centre for Medium Range Weather Forecasts Integrated Forecasting System. International Journal of High Performance Computing Applications , 1094342015576773.

Performance competitive with MPI-3 for several applications:

Garain, S., Balsara, D. S., & Reid, J. (2015). Comparing Coarray Fortran (CAF) with MPI for several structured mesh PDE applications. J ournal of Computational Physics .

Speedup of 50% relative to MPI-2 for plasma fusion code on 130,000 cores:

Preissl, R., Wichmann, N., Long, B., Shalf, J., Ethier, S., & Koniges, A. (2011, November). Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms. In P roceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (p. 78). ACM.

And nearly every project I’ve worked on for the past ~5 years has involved parallel Fortran 2018, though not always coarrays specifically and not all of the projects are open-source and some are dormant, but a few of the open-source codes are

Sadly, none of these are really the best examples because the first is a proxy application, the second is very early in its development, the third hasn’t yet merged the parallel branch into the default branch, and the fourth is dormant for funding reasons. Nonetheless, the first one played a central role in a Ph.D. dissertation just submitted last month and I was just awarded funding to continue work on it so it so at least the development will continue.

In summary, I don’t think the situation is quite as stark as most imagine, but it’s nowhere near what it could be. Compiler support is considerably broader than most people realize. There have been several published successes with parallel Fortran 2018 running in important applications at scale.

Finally, the best news I’ve seen on do concurrent is that NVIDIA supports offloading do concurrent to GPUs. Researcher Jeff Hammond, who recently moved from Intel to NVIDIA, has privately shared great results from testing this capability with his Parallel Research Kernels. He reported bandwidth comparable to CUDA.

It’s time for users to be more vocal and let the lagging vendors know they want better support for do concurrent. I strongly disagree with the statement that do concurrent has “profound design flaws.” Every problematic case I’ve seen involves pointers or indirect addressing. I rarely use either of these features. While they are common in some applications, especially unstructured-grid applications, there are many useful production applications that never need those features, especially structured-grid applications. If do concurrent is profoundly flawed, then it’s hard to understand why NVIDIA would work on offloading and it’s hard to explain the results Jeff Hammond reported to me privately. I hope we see some publications of such results soon.

Fortran’s adherence to upward compatibility is likely one of the main reasons the language remains in use after more than 60 years. Unless the problematic cases can be addressed without breaking upward compatibility, I think it would be better to propose a replacement feature, e.g., do parallel, with syntax as close to do concurrent as possible so that existing codes can migrate to the new feature easily.

7 Likes

I didn’t know that… :-o If DO CONCURRENT works directly with GPU, I definitely would like to try (if it is easier to use than CUDA itself… :slight_smile: )

Unless the problematic cases can be addressed without breaking upward compatibility, I think it would be better to propose a replacement feature, e.g., do parallel , with syntax as close to do concurrent as possible so that existing codes can migrate to the new feature easily

From a user’s side, it is no problem which keyword or syntax is used for the parallel execution (e.g. FORALL, DO CONCURRENT, DO PAPARALLEL). But my concern is why those keywords have some problems and even FORALL deprecated(?) (though I feel it very convenient)… I guess it might be some limitation of a “specification first” approach (i.e. no implementation before the final/formal specification)

2 Likes

@septc I’m not a performance expert and don’t know any of these features as well as compiler writers or those who have given it a lot more thought, but I’ll summarize what I’ve picked up from conversations with committee members and compiler writers.

The problems relate the constraints the standard places on forall and do concurrent.

  1. It might be fair to say that forall is overconstrained in ways that prohibit performance optimization in most or all cases. The forall problem can’t really be fixed without fundamentally changing its nature: forall defines an array assignment with constraints that essentially obligate the compiler to evaluate the all right-hand-side elements before it can assign to any left-hand-side elements. That leaves a lot of potential parallelism on the table. Contrary to what most programmers would expect, it’s rare that forall helps performance and it sometimes even hurts performance.

  2. It might also be fair to say do concurrent is underconstrained in ways that prohibit performance optimization in some cases. In my view, the important caveat is that the problematic cases can be avoided in code that doesn’t use pointers or indirect addressing and that covers a lot of use cases. There have been proposals to fix do concurrent, but those proposals would break some existing codes, which the Fortran committee tries assiduously to avoid doing.

I hope we define something new, and I would like to see the committee signal in to the community in advance that if any problems are found with optimizing this new thing, say do parallel, then the problems will be considered bugs in the standard and will be subject to change even if it breaks existing code. In fact, it would be nice to have a new designation in the standard that symbolizes this. Just as we currently have “obsolescent” and “deleted” features, we could have a new category of “experimental” features or something to let people know that the feature might change in ways that break upwards compatibility.

4 Likes

@pmk thanks for the corrections. I’d love for you to join the committee. I’m aware of at least one other committee member who would support the idea of proposing a replacement, but neither of us has the expertise and insights that you have. The process starts with submitting a paper to a meeting. The requirements for papers are extremely minimal. Some papers are very short: a page or less. Although the features for F202X are set, I’m sure the committee would be open to a proposal for F202Y. In case it helps, our convener is committed to shortening the revision cycles so X and Y aren’t as large as they might have been in the past, but there has to be a champion on the committee to submit the paper and sheperd it through the process of subgroup discussion and revisions. You would be the best person for that.

I’ll review the link you sent. I must have missed something from our past discussions because I asked you whether there were cases that didn’t involve pointers or indirect addressing. My recollection is that the only other case you mentioned was associate, but you said that working around the latter case would likely be easy.

It would be interesting to hear some details about what cases NVIDIA can and cannot offload to the GPU. I imagine many in the community would benefit from the information.

2 Likes