Fortran applications using Fortran 2008+ features

@pmk I had a similar experience and it turned out the reason in my case was because I submitted the paper out-of-cycle in the sense that the committee wasn’t entertaining new feature proposals at the time. I wonder if the same was true for your paper. My paper eventually succeeded because I was there to ping the committee again at the appropriate time in the cycle. We’re a relatively small committee with limited human resources. I would be glad to champion your paper, but I’ll likely need a reminder when the floor is open for Fortran 202Y feature proposals. I would propose it as something new but with as much similarity do concurrent as possible in order to minimize the cost of refactoring existing codes.

@pmk The first link you forwarded is broken, but I think you’re referencing the LLVM issue here. Every code example in or after the section “The identification problem” involves either a pointer or indirect addressing. The same is true for the paper at the second link you posted: the only example uses indirect addressing.

Just in case I’m using incorrect or imprecise terminology, I’m referring to indirect addressing as lines like

    T(IX(J)) = A(J) + B(J)
    C(J) = T(IY(J))

I very rarely need to write code like this, although I acknowledge that there are many good reasons such code is required and the compiler has to handle all cases. It would be great to see an example that doesn’t involve pointers or indirect addressing. Otherwise, I’ll be in a weak position in championing a solution to a problem with which I have little or no experience.

This is very helpful. For a replacement feature (e.g., do parallel), I’m hopeful that requiring that any procedure called be simple (a new Fortran 202X procedure attribute) will eliminate the above case.

Peter, in this example (and any other such case) cannot the compiler warn the user that this case cannot be parallelized because foo is accessing a global variable? If foo was not accessing exposed then this could be parallelized, so as a user I would simply expect the compiler to always parallelize do concurrent or report an error (or a warning) why it cannot be done.

The standard does not require parallel execution, but users expect it. That is the root of the problem here I think. As a user, I would like the compiler to help me.

Regarding how to implement this in a compiler, here is one way: when compiling foo (in a separate translation unit), the compiler will note whether or not the function is simple (it accesses global variables). In this case it will note it is not simple. Then if you call a non-simple function from do concurrent, you will get a warning that this will not be parallelized. If the function is simple, I think the compiler might only need a local analysis of the do concurrent loop, without any need to cross-file interprocedural analysis to determine if the loop can be parallelized.

1 Like

@certik even if your last sentence is true in the case presented, I imagine it might not be true if exposed is accessed via use association and the declaration is in another file. But I do like the general spirit of what you’re saying. I would love it if a compiler would parallelize the cases where it can do so safely and either not parallelize unsafe cases and/or emit a warning in unsafe cases.

This paper is just out:

Parallel Hybrid Simulations of Block Copolymer Nanocomposites using Coarray Fortran by Diaz et al.

Unfortunately it’s paywalled, but I expect you can get a copy by emailing the author.

3 Likes

A 2019 working paper by the same authors that mentions Coarray Fortran is at arXiv: Large scale three dimensional simulations of hybrid block copolymer/nanoparticle systems.

3 Likes

The conclusion says:

Coarray Fortran has been used to develop a parallel code of the well-established CDS scheme for BCP melts and BCP nano-composites systems. This relatively simple approach based on spatial decomposition shows no drawbacks when compared with a more elaborate method using MPI. The scaling of the pure CDS code is highly linear for relatively large system sizes and improves the previous implementation using MPI.

The best scaling behavior has been found using the CRAY Fortran compiler in the CSCS supercomputer.

2 Likes

Code based on Fortran 2018 (The elphbolt ab initio solver for the coupled electron-phonon Boltzmann transport equations)

link: https://arxiv.org/pdf/2109.08547.pdf

The coupled BTEs solver described above is implemented in Fortran 2018. This allows
us to make use of the object-oriented programming (OOP) support and the built-in
coarray functionality that provides concise, native syntax for parallelization. Specifi-
cally, we create the following 7 derived types: crystal, symmetry, numerics, electron,
phonon, epw wannier, and bte dealing with the components of the problem that the
names suggest. Each derived type contains its own data and procedures (functions and
subroutines). Apart from these, there are separate modules for immutable parameters, helper procedures, etc. This hybrid OOP/procedural design enables extensibility of the
code. Boilerplate getter and setter functions are generally avoided. Instead, the intent
and use, only keywords of Fortran are strictly used to control the read, write, and use
access of the different components of the code. This design strategy makes the code com-
pact (about 6700 lines) for what it offers and easily readable. As a general rule, code
repetition is avoided unless the generalizations lead to slow or physically unclear source.
We tried to strike a balance between the speed of development, execution, readability,
and extensibility.

3 Likes