Why cannot the Fortran coarray access an element in a different image?

Shahid · May 5, 2025, 4:15pm

My following coarray code does not show the expected output

program caftest
  implicit none
  integer, dimension(5), codimension[*] :: a = 1
  
  ! Image 1  
  a(1)[1] = 11
  sync all
  
  ! Image 2
  a(2)[2] = a(1)[1]
  sync all
  
  ! Only Image 1 prints
  if (this_image() == 1) then
    print*, 'a(2)[2] (Image 2 a(2)):', a(2)[2]  ! Should be 11
    print*, 'a(1)[1] (Image 1 a(1)):', a(1)[1]  ! Should be 11
  end if

end program

output

 a(2)[2] (Image 2 a(2)):           1
 a(1)[1] (Image 1 a(1)):          11

However, modifying it provides the expected output

program caftest
  implicit none
  integer, dimension(5), codimension[*] :: a = 1
  
  ! Image 1
  if (this_image() == 1) a(1) = 11
  sync all
  
  ! Image 2
  if (this_image() == 2) a(2) = a(1)[1]
  sync all
  
  ! Only Image 1 prints
  if (this_image() == 1) then
    print*, 'a(2)[2] (Image 2 a(2)):', a(2)[2]  ! Should be 11
    print*, 'a(1)[1] (Image 1 a(1)):', a(1)[1]  ! Should be 11
  end if
end program

output

 a(2)[2] (Image 2 a(2)):          11
 a(1)[1] (Image 1 a(1)):          11

can’t we use the shorter version (first code) of accessing data among different images? I run the codes windows 10.

nncarlson · May 5, 2025, 9:18pm

What compiler are you using? Both the NAG and Intel compilers give the expected result with your original example. (Normally having all images write to the same location on the same image will create a race condition, but since they’re all writing the same value it doesn’t matter. But you’d never want to do that in practice.)

Shahid · May 6, 2025, 8:39am

I have not used intel compiler so far. I used the Simply Fortran IDE with Coarrays. It has gfortran compiler.

themos · May 6, 2025, 9:16am

I suggest that there is no “expected result” because the code is invalid. The reason is as follows: The first segment of program CAFTEST consists of the statement

  a(1)[1] = 11

There is no ordering imposed on that first segment executing on any of the images, in particular segment-1 executing on image-1 is unordered relative to segment-1 executing on image-2. 11.7.2 “Segments” (of J3/25-007) , para 3 lists the conditions for statements to be valid in unordered segments. None of the conditiions apply, and so we read

• if a variable is defined or becomes undefined on an image in a segment, it shall not be referenced,
defined, or become undefined in a segment on another image unless the segments are ordered,

This is an explicit injunction against “race conditions”. The program therefore has no meaning within the rules of Fortran, and there is no “expected result”.

Shahid · May 6, 2025, 10:27am

What does ordering mean here and how to impose it?

segment is another term new to me. Does assignment for coarray mean segment?

themos · May 6, 2025, 11:31am

Segment ordering is the concept that the language definition develops in order to nail down what you expect the result to be and the conditions that must hold.

You cannot analyse validity of coarray programs without knowing about segment ordering.

I cannot reproduce what the Standard document says in my own words without making sloppy mistakes, you should consult a reputable book.

In your example, a reasonable way to formulate what you intended is to use atomic intrinsic subroutines, I believe.

jeff · May 6, 2025, 11:52am

Your original code also works using GNU Fortran + OpenCoarrays on macOS, so I think there is an issue in the Simply Fortran Windows Coarray library. I’m having a look at the internals right now to see where the problem lies.

EDIT: There appears to be an issue in our _gfortran_caf_sendget implementation, which is called on the line a(2)[2] = a(1)[1].

jeff · May 6, 2025, 6:07pm

It looks like our library, when handling a “sendget” (which is how GNU Fortran implements the a(2)[2] = a(1)[1] line) had some issues misidentifying a memory offset. Simply Fortran’s coarray implementation treats a “sendget” request as “get” requested foisted on another image. Effectively, all images are telling image 2 to get data from image 1. These messages are received by image 2, and it attempts to perform a “get” request as if it had originated from image 2. However, we had a memory issue reconstructing the “get” request where we were applying a memory offset to the source data rather than the destination data (since it is assigning it to the second element of image 2’s copy of the a array).

There was another nasty problem related to retrieving the passed data’s array dimensions that I stumbled across debugging this simple example. We’ll get a bug-fix build out in the next day or two to correct the problems.

themos · May 7, 2025, 8:43am

I am confused now. Are you aware, jeff, that the original code has no interpretation in Fortran? What interpretation are you using to “fix” the compiler?

jeff · May 7, 2025, 11:32am

The correctness of the original code notwithstanding, this Fortran sample did reveal a bug in a coarray runtime library implementation. The output of GNU Fortran does make a call to a coarray library that was subsequently mishandled.

I was merely pointing out that the output of the compiled code when linked with Simply Fortran’s coarray library was different from a case where the OpenCoarrays library was used, which should not have been the case and was revealed to be caused by a bug.

themos · May 7, 2025, 1:29pm

Is it the case that the modified code, that does have an interpretation, and looks like

 ! Image 1
  if (this_image() == 1) a(1) = 11
  sync all
  
  ! Image 2
  if (this_image() == 2) a(2) = a(1)[1]
  sync all

also reveals the bug? If you were to add a test case to catch any regressions of the bug, I hope it would be the modified, correct code.

jeff · May 7, 2025, 4:11pm

No, your snippet would not exhibit the bug because the compiler will generate a _gfortran_caf_get call rather than the formerly problematic _gfortran_caf_sendget call.

themos · May 7, 2025, 4:35pm

I think you should check correct operation of code generation for a correct program. Maybe

  ! Image 1
  if (this_image() == 1) a(1) = 11
  sync all
  
  ! Image 3
  if (this_image() == 3) a(2)[2] = a(1)[1]
  sync all

would generate the _gfortran_caf_sendget that you seek?

jeff · May 7, 2025, 4:39pm

Yep, that generates a _gfortran_caf_sendget and works after the fix.

Topic		Replies	Views
Coarray Reading file Help	6	572	September 9, 2023
Assigning the same input value to a variable on all coarrays Help	2	443	October 15, 2022
Stuck trying to get coarrays to do simple thing Help	4	442	March 30, 2022
Using Coarrays and Memory Efficiently Help	12	1261	February 12, 2022
Problems with coarrays	20	1717	February 4, 2023

Why cannot the Fortran coarray access an element in a different image?

Related topics