Thanks @rouson for giving this talk ! A great showcase of great tools of the language. Would recommend anyone, familiar or not, to watch it
Thanks for posting this, @jorgeg!
An interesting talk, thank you @rouson
My experience (mostly simple tests) with do concurrent
using gfortran
or Intel compilers has been rather discouraging. Do I understand it right that the LLVM flang 21 compiler has achieved some breakthrough in that regard?
BTW, are the slides available for download somewhere?
Thanks @rouson, I found the talk very interesting, especially for how newer versions of the Fortran standard are being used.
Comments which I was interested in included ;
A significant group of Fortran users are mainly using F90-2003 (like me); I wonder how this statistic has been obtained ?
The development of do concurrent is very suited for GPU off-loading. I do find it’s use of pure functions difficult to apply to my multi-threading applications and prefer to use OpenMP directives with it’s better documentation/control of sharing assumptions.
Using Ver 5+ OpenMP for off-loading appears to be far more complex that what do concurrent offers so this will be an interesting future. (I do not use a GPU as yet.)
Fancy Fortran considering novel hardware changes.
Hi @msz59. Interesting question. The imminent release of LLVM flang
21 will be the first version to support parallelization of do concurrent
on CPUs, whereas compilers from NVIDIA, Intel, and HPE (Cray) already parallelize do concurrent
on CPUs and can offload do concurrent
to GPUs. So it’s probably not appropriate to refer to flang
’s new capability as a breakthrough.
However, some aspects of this feel like a breakthrough for me personally. I’ve been collaborating with AMD compiler engineers on parallelizing do concurrent
in flang
and the interactions have been mutually beneficial with changes on both sides: the code I’m writing and what flang
is able to support. One example relates to invoking type-bound procedure inside do concurrent
. The inherent polymorphism of a type-bound procedure’s passed-object dummy argument usually necessitates that compilers use dynamic dispatch, which hinders some optimizations and precludes GPU offloading with current compilers. Through our collaboration, we discovered that adding the non_overridable
attribute to type-bound procedures enables the compiler to avoid dynamic dispatch. The most important implication is that it should be possible to automatically offload do concurrent
constructs that contain type-bound procedures invocations, but offloading do concurrent
is future work in flang
.
We talk about it in a [paper]( Automatically parallelizing batch inference on deep neural networks using Fiats and Fortran 2023 `do concurrent` ) that I presented at the 5th International Workshop on the Computational Aspects of Deep Learning in June.
FWIW, I mostly like type-bound procedures as a packaging mechanism so I’m glad we found a way to support the above capability. I like being able to write a statement like use foo_m, only : foo_t
to import a type foo_t
from a module foo_m
and to have all of the imported type’s public type-bound procedures become accessible too without having to list each procedure in the only
clause.
@JohnCampbell thanks for question. If I said “users”, then I meant to say “developers” (which likely implies the same is true for users). That statement is anecdotal. FWIW, I’ve taught Fortran training courses and university courses roughly 40 times and have spent a significant fraction of my career on research software engineering projects that involved working with codes across many different fields. In my personal experience, even knowledge of what’s in the post-2003 standards is fairly unusual. I’d even go so far as to say most of the code I observe doesn’t even fully utilize the Fortran 90-2003 standards.
I can think of so many cases recently when I’ve met Fortran programmers at every career stage who were unaware, for example, of the multi-image features (often called “coarray Fortran”) and were unaware that some compilers could automatically offload do concurrent
to GPUs.
Regarding specifying sharing assumptions, I often quote retired HPE engineer Bill Long in saying that default(none)
in do concurrent
is good code hygiene – similarly to implicit none
. Using default(none)
in a do concurrent
construct obligates the programmer to specify the locality of variables, e.g., local
, local_init
, or reduce
. Gfortran 15 supports these locality specifiers as do LLVM flang
20 and recent versions of the compilers from Intel, NVIDIA, and HPE (Cray).
Regarding pure
, the most common reason that I hear for not using it is the inability to print values when debugging. I’m hoping that our Assert and Julienne libraries at least partially alleviate that issue by supporting writing assertions inside procedures, including pure
procedures. When one prints for debugging purposes, presumably one has an expectation about what values or ranges of values of the variables are acceptable. The philosophy behind Assert and Julienne is “Capture your expectation in an assertion.” You get no output (and need no output) if the assertion succeeds. If the assertion fails, then you can potentially get rich output at the cost of error termination. With Assert, the output can include the text of the stated condition along with a file name and line number. With Julienne, the output can include variable values and additional contextual information such as an expected value, an actual value, and a tolerance. I hope this helps.
I couldn’t agree more on the importance of avoiding going through a vtable
every time a TBP has to be called. This is an often overlooked but crucial point for adopting good object-oriented practices in high-performance codes where every nanosecond matters, e.g. in CFD, where (most) classes are almost never really extended.
A while ago I had attempted to foster a discussion on simplifying usage of non_overridable
: I believe it is a low-hanging fruit for “relatively easy” standard extension that would bring significant benefits (no more classes clogged with non_overridable
keywords)
Since I tend to put non_overridable
in every non-generic TBP, I would certainly love an addition like that, but without the “non-extensible type” implication.
We already know that this%proc(...)
is “sort of” syntactic sugar for proc(this, ...)
, and that even types with the bind(C)
attribute can be extended through the latter pattern.
But by making “non-extensible” official for non-interoperable types, a user wanting to extend the type anyway, would have to do it through my_proc(this, ...)
—thus, polluting subsequent use my_mod, only:
appearances to import those additional procedures.
Out of curiosity, how come the Fortran video is only 20 minutes long, while the rest are closer to 1 hour?
Fortran is so much simpler?
it takes Fortran 20 minutes to make its point, the other take an hour
As William Shakespeare’s character Polonius stated in Hamlet, “Brevity is the soul of wit”. Not sure what that translates into in other languages.
@mcditoos that’s a great question! I noticed that too. When I was invited to give the talk, my recollection is that I was asked to make the video 15-20 minutes long. So the funny thing is that I was a little worried that my video was close to the 20-minute limit until I saw the length of one of the other videos. Maybe I misunderstood or misheard the initial instructions. Anyway, I think shorter is likely better. I fear that not very many people would watch an hour-long YouTube video.
On printing in pure
procedures, depending on the situation, I probably would try a debugger first, but I also made a logger that uses a linked list so it can collect messages inside of a pure
procedure. The actual printing is done later outside of the pure
procedure. One could add an optional
argument to the procedure of interest for the pure
logger so that no code outside of the procedure needs to change. Then statements like if (present(log)) call log%error(...)
can save messages to the logger and they can be printed later with call log%close()
the way I implemented it.
I also have my own pure
assertion subroutine that uses error stop
, probably similar to yours.
This works only for subroutines, right? All arguments to a pure function must be intent(in)
, so an optional argument such as log
is not allowed to be modified.
@RonShepard You are correct. I had not realized. I must have only used this on subroutines.
Can you point us to a sample of this implementation? I have been thinking at something like this for an I/O suite but always stalled on this implementation because I don’t know Linked Lists very well. Thanks in advance
Here is a starting point.
program xxx
implicit none
type ll_t
character(:), allocatable :: text ! line of text.
type(ll_t), allocatable :: history ! previous lines of text.
end type ll_t
type(ll_t), allocatable :: list
call push( 'first line', list )
call push( 'second line', list )
call push( 'this is the last line', list )
call printall( list )
contains
subroutine push( newtext, list )
! push a new line of text into the linked list.
character(*), intent(in) :: newtext
type(ll_t), intent(inout), allocatable :: list
type(ll_t), allocatable :: work
allocate( work )
work%text = newtext
call move_alloc( from=list, to=work%history )
call move_alloc( from=work, to=list )
return
end subroutine push
recursive subroutine printall( list )
type(ll_t), intent(in) :: list
if( allocated(list%history) ) call printall( list%history )
write(*,'(a)') list%text
return
end subroutine printall
end program xxx
$ flang xxx.f90 && a.out
first line
second line
this is the last line
Which flang compiler gave that output? AMD flang gave this:
F90-S-0155-Derived type component must have the POINTER attribute - history (linkedlist.f90: 5)
F90/x86-64 Linux Flang - 1.5 2017-05-01: compilation completed with severe errors
However gfortran and ifx both gave the same output that @RonShepard got.