DO CONCURRENT question

rwmsu · June 18, 2022, 1:00pm

When refactoring old code you sometimes run in to the following pattern for nesting DO loops

DO 1000 K=1,10
DO 1000 J=1,100
DO 1000 I=1,1000
D(I,J,K) = A(I,J,K) + B(J)*C(I)
1000 CONTINUE

To avoid the extra work of replacing the above with DO/END DO constructs, I thought maybe the logic could be replaced with a single DO CONCURRENT block ie

DO CONCURRENT (I=1:1000, J=1:100, K=1:10)
D(I,J,K) = A(I,J,K) + B(J)*C(I)
END DO

However, reading more about the contraints etc on DO CONCURRENT, I’m not sure if the DO CONCURRENT statement is a direct replacement for the original logic. My best guess is maybe it is and maybe it isn’t. So question, will the DO CONCURRENT form act the same as the original code.

Also, has anyone proposed a modification to the current standard DO to collapse nests such as the example above into one statement. Basically you would have the DO CONCURRENT syntax without the concurrency constraints as well as masks etc

ie

DO (I=1:1000, J=1:100, K=1:10)
D(I,J,K) = A(I,J,K) + B(J)*C(I)
END DO

with the focus of this proposed modification refactoring old code.

JohnCampbell · June 19, 2022, 6:01am

In .f90 syntax, you can write
DO k=1,10; DO J=1,100; DO I=1,1000
D(I,J,K) = A(I,J,K) + B(J)*C(I)
END DO; END DO; END DO ! an ugly finish!!

However, the selected order of i,j,k is significant.

It is unclear how “DO CONCURRENT (I=1:1000, J=1:100, K=1:10)” would be optimally compiled.
Would the optimum i,j,k order be recognised ?
Would the appropriate use of “CONCURRENT” be recognised for parallel / multi-thread ?
Do modern compilers already adjust “DO I=1,1000; DO J=1,100; DO k=1,10” ?

I don’t use DO CONCURRENT, perhaps because I thought it related to co-arrays, but on reading “Modern Fortran Explained” I am more confused ! DO CONCURRENT “is provided to help improve performance by enabling parallel execution of the loop itterations”.

Why introduce DO CONCURRENT but remove FORALL ?
Just another attempt at concise multi-loop syntax ?
Why “help” but not recognise OpenMP ?
Is it actually associated with coarrays ?
It looks like a second attempt at the same failure ?
The concept of “help” really looks misplaced in comparison to other approaches of the Fortran standard.

I understand that ifort has a /Qparallel feature for classic DO that recognises the overhead of OpenMP initiation vs the benefit of multi-thread computation savings, but I always make these choices explicitly, especially for defining SHARE and PRIVATE.

As with FORALL, DO CONCURRENT does not appear to have sufficient support from other Fortran syntax.

What do others think ?

FedericoPerini · June 19, 2022, 6:24am

I’m a fan of keeping code short and so even though it’s going to be deprecated at some point I would do a one liner like

forall(I=1000,J=1:100,K=1:10) D(I,J,K) = A(I,J,K) + B(J)*C(I)

Sometimes it’s impossible to replace loops with array operations: forall is clear, concise, and tells the compiler that stuff can be vectorized. For me it’s a no brainer to use it

JohnCampbell · June 19, 2022, 9:10am

Would possibly the following be more effective ?

DO K=1,10 ; DO J=1,100
D(:,J,K) = A(:,J,K) + C(: ) * B(J)
END DO ; END DO

I do prefer to use array syntax for the inner loop, which indicates AVX possibilities.

FedericoPerini · June 19, 2022, 10:19am

Well, then why not

forall(j=1:100,k=1:10) D(:,j,k)=A(:,j,k)+C*B(j)

But as long as some of the indices can’t be collapsed, I’d find it more understandable to have all of them explicited

egio · June 19, 2022, 11:47am

The point is that in a forall statement all the right end side of the expression should be fully evaluated before the assigning taking place and there could be temporary array creation.
I prefer a do concurrent even though I need to write one more line. I prefer not to use an obsolescent feature.
As far as I know, the actual version of gfortran should optimize the priority of the index in a do concurrent loop in order to make it faster.

rwmsu · June 19, 2022, 12:47pm

@FedericoPerini - As you probably know, FORALL is not a replacement for ordered DO loops. The FORALL construct was an attempt at a parallel assignment inspired by an equivalent command in the Connection Machine Fortran and High Performance Fortran (HPF). Compiler developers have struggled since the Fortran 95 standard hit the streets (around 1997 if I remember correctly) to implement an efficient version of FORALL and all of them apparently failed. Hence, the development of DO CONCURRENT. What I propose effectively collapses the multiple DO statements in the nest into one line. Plus FORALL was declared obsolete in the Fortran 2018 standard. I personally refrain from using any Fortran command or feature that has been officially declared obsolete.

@JohnCampbell , Yes you can refactor the nested loops in the way that you suggest but I personally don’t see that it makes it more readable. To each his own though. As with obsolete features, my personal coding standards avoid multiple statements on a line. Again thats just a personal preference. Also, as I stated above, my focus for this is refactoring old code (with an eye towards an automated refactoring tool sometime in the future).

FedericoPerini · June 20, 2022, 10:22am

Thank you for the historical perspective. This is very interesting because I’ve stumbled upon this comment many times: that forall was never properly implemented in compilers and so it was decided to dump it in favor of do concurrent. Could you elaborate more on this?
My compiler experience is pretty much limited to gfortran and I’ve never had any such issues with forall on it, at least for loops that aren’t too convoluted.

See this comparison between forall and a loop, forall isn’t bad, is it?

rwmsu · June 20, 2022, 12:02pm

Frankly, I’m not sure myself why FORALL was so hard to implement. Maybe one of the compiler developers or committee members can comment. My best guess (and its only guess) is that it didn’t map well to modern multi-core processors. Remember its a parallel assigment statement that allowed the right hand side of any assignment to be evaluated on multiple processors in any order. FORALL has its roots in the Connection Machine systems of the late '80s early '90s. The Connection Machine was classified as a “streaming multi-processor” much like current GPGPUs but I don’t think the architectures are the same. The CM systems were basically several thousand relatively low-performance CPUs with their own small cache of memory that were connected by a system of specialized routers in either a hypercube or fat-tree topoplogy.

I think the original concept for DO CONCURRENT was focused more at vectorization than parallelization (although one is really a form of the other).
One of its functions is to replace the !$ivep type compiler directives that have been around since the early Cray days with a standard construct. To date I’ve refrained from using DO CONCURRENT because as far as I know only the NVIDIA compilers and soon gfortran if the GSoC project to off load DO CONCURRENT to GPUs is successful actually do anything in parallel. I think Modern Fortran Explained has a section on FORALL that describes the percieved problems it has.

Also, for anyone interested, the CM 5 Fortran manuals etc are available at the following link.

https://people.csail.mit.edu/bradley/cm5docs/

They make for interesting reading. For instance, they were using the bracket form of array constructors back in 1991.

Beliavsky · June 20, 2022, 12:18pm

FedericoPerini:

I’m a fan of keeping code short and so even though it’s going to be deprecated at some point I would do a one liner like
forall(I=1000,J=1:100,K=1:10) D(I,J,K) = A(I,J,K) + B(J)*C(I)
Sometimes it’s impossible to replace loops with array operations: forall is clear, concise, and tells the compiler that stuff can be vectorized. For me it’s a no brainer to use it

I like forall too, and I have read that it is unlikely to be deleted, but I think it should only be used for operations that will take a small fraction of overall run time. I believe forall was made obsolescent because compilers were not able to implement it efficiently.

FedericoPerini · June 20, 2022, 1:24pm

I’ve just found this interesting google group whose discussion had generated the proposal to delete forall. IMHO most of the statements against forall are pretty vague, like "I’ve never seen a code using forall", but mainly, the major culprit is that sometimes using forall requires the creation of an array temporary (like with any other function assignment…), and thus labelled “too slow”.

I very much agree with most of the positive comments on the construct, which allows clear, concise code.

Anyways regardless of my opinions on the feature being deleted, it’s a pretty interesting read.

Beliavsky · June 20, 2022, 1:47pm

As an “old man” (early 50s) let me point out that the “google group” you refer to is the Usenet group comp.lang.fortran, whose history was discussed in the thread Comp.lang.fortran: 37 years of discourse.

Topic		Replies	Views
DO CONCURRENT: compiler flags to enable parallelization Help	7	2900	January 3, 2024
How does OpenACC 'collapse()' interact with 'do concurrent'? Help	2	380	December 1, 2023
Reduction specifier for do concurrent Help	0	635	May 4, 2022
Do vs do concurrent	1	369	April 12, 2024
Can Fortran's 'do concurrent' replace directives for accelerated computing? (paper)	3	644	November 16, 2021

DO CONCURRENT question

Related topics