Problems with coarrays

Hello all!

I seem to be having trouble and I’m wonding if anyone could help me…

I’m at a stand still until someone or some tutorial out there can help me.

I’m running on windows10 in the WSL. I’ve got caf compiling and cafrun works, but the coarrays just do not work.

Any ideas?

Example;

  ! Program Paratest
  ! Written by Frank Meyer
  ! Created Jan 9, 2022
  ! Version 0.1a
  ! Description: Testing some sync issues.
program Paratest
  implicit none

  ! Create co-arrays for testing
  integer :: i_test1[*]

  ! Other variables for testing purposes.
  integer :: i_loop1, i_loop2, i_me, i_all
 
  ! Set variable for work
  i_loop1 = 1
  i_loop2 = 2
  i_me = this_image()
  i_all = num_images()

  print *, i_me, " says; ", i_loop1    ! Does printing work? Good...
     
  sync all

  i_loop2 = i_me + 1                   ! Can we access the coarray?
  if (i_loop2 .gt. i_all) then
     i_loop2 = 1
  end if

  i_test1[i_loop2] = i_me              ! Put our number over one.

  sync all

  print *, i_me, " now says;", i_test1 ! Did it get here?

  do i_loop1 = 1, i_all                ! Add a dynamic value to it.
     i_test1[i_loop1] = i_test1[i_loop1] + 1
  end do                                     ! Equiv to i_test1 += i_all

  sync all
 
  print *, i_me, " finally says;", i_test1 ! See the failure...

end program Paratest

Output is;

       2  says;            1
       3  says;            1
       4  says;            1
       7  says;            1
      10  says;            1
      11  says;            1
      12  says;            1
      14  says;            1
      15  says;            1
       8  says;            1
      16  says;            1
       1  says;            1
       6  says;            1
       9  says;            1
      13  says;            1
       5  says;            1
       1  now says;          16
       2  now says;           1
       5  now says;           4
       6  now says;           5
       7  now says;           6
       8  now says;           7
       9  now says;           8
      10  now says;           9
      13  now says;          12
       3  now says;           6
      11  now says;          10
      12  now says;          11
      14  now says;          13
      15  now says;          14
      16  now says;          15
       4  now says;           5
       1  finally says;          22
       2  finally says;           5
       3  finally says;          12
       5  finally says;          16
       7  finally says;          19
       8  finally says;          19
       9  finally says;          21
      10  finally says;          23
      11  finally says;          18
      13  finally says;          26
       4  finally says;          10
       6  finally says;          13
      14  finally says;          27
      12  finally says;          27
      15  finally says;          24
      16  finally says;          30

Note that 3 says 6 early in the run proving that sync all did not stop thread 3 from reaching past the sync all before the rest of the threads.

Knarfnarf.

1 Like

@Knarfnarf , welcome! Perhaps you want to try your program with another processor also to better understand the issue(s) with cafrun?

Intel oneAPI is available currently without requiring a commercial license.

sync all
print *, i_me, " now says;", i_test1 ! Did it get here?
sync all

And two sync all,maybe it is a bug for open-coarrays.The result of ifort is right.

I think there are two errors: One is the placement of the first sync all and the other is non-use of atomic intrinsic atomic_add for the +1 operation.

Your first sync does nothing useful. Your second is before the “now says”. As soon as images get past that sync, some other image can change some image’s i_test1 (by rushing ahead to i_loop1) so the print shows the changed value.

Try only having “sync all” around the last two prints (the final one can is not needed, of course) and call atomic_add.

This is your problem. You’re having each image update every value, with no synchronization and not atomically. Thus, some values may not be correctly updated.

Also, since there is no barrier between this loop and the previous print statement, some images may be updating the value before other images have a chance to print it.

The suggestions from others should give some clues as to how to fix these problems.

WOW! That oneAPI was impossible to setup! Is there a smaller group of installs available? Something less than 50? Or maybe an install.sh to do it all automatically?

Knarfnarf

1 Like

Well… The sync all’s do have a reason, the first one keeps the print messages from the front aligned, the second after rotating everything helps those changes stay aligned, and the last should be all we need to get the last prints to be correct… And adding a sync all in the middle of the loop actually did nothing.

But I’m curious; what’s the code for atomic_add look like?

Knarfnarf

If your interest is only in Fortran, you can try proceeding with just Intel’s HPC toolkit, that shouldn’t be too difficult to setup and give you the IFORT compiler (and IFX, a beta version of Intel’s LLVM-based compiler): Download the Intel® HPC Toolkit

code
program Paratest
  use iso_fortran_env , only: atomic_int_kind, lock_type
  implicit none
  integer(atomic_int_kind) :: i_test1[*]
  Type(lock_type) :: test_lock[*]
  integer :: i_loop1, i_loop2, i_me, i_all

  ! Set variable for work
  i_loop1 = 1
  i_loop2 = 2
  i_me = this_image()
  i_all = num_images()

  print *, i_me, "a says; ", i_loop1    ! Does printing work? Good...
  ! sync all ! original, useless because of next sync
  i_loop2 = i_me + 1                   ! Can we access the coarray?
  if (i_loop2 .gt. i_all) then
     i_loop2 = 1
  end if
  i_test1[i_loop2] = i_me              ! Put our number over one.
  sync all ! original, must wait for i_test1 to be updated by other image
  print *, i_me, "b says;", i_test1 ! Did it get here?
  sync all ! needed, stops other images rushing ahead and changing
           !         i_test1 before we print it above
  do i_loop1 = 1, i_all                ! Add a dynamic value to it.

!!$    !Alt critical
!!$    critical
!!$      i_test1[i_loop1] = i_test1[i_loop1]+1
!!$    end critical

!!$    !Alt atomic
!!$    call atomic_add( i_test1[i_loop1] ,1)

!!$    !Alt locks
!!$    Lock(test_lock[i_loop1])
!!$    i_test1[i_loop1] = i_test1[i_loop1]+1
!!$    Unlock(test_lock[i_loop1])

!!$    !Alt race condition
!!$    i_test1[i_loop1] = i_test1[i_loop1]+1

  end do                               ! Equiv to i_test1 += i_all
  sync all ! original, needed, otherwise i_test1 will be printed
           !           before all images have finished updating it
  print *, i_me, "c says;", i_test1 ! See the failure...
end program Paratest

The website for oneAPI does look great, but the instructions are not download/install/done… If I can’t get this to work any other way I’ll look into it, but it did seem to require a rather large amount of requirements and dependencies. As it is, I think I’ve got the WSL and cygwin compiling/running now and they both have the same behavior. Obviously it is my code that is flawed, just how to fix it will be the issue.

Knarfnarf

Interesting!

So am I correct in seeing 3 solutions there along with the original code?

Knarfnarf

So, not sure why this comment was erased, but; redoing it.

My answer seems to be to add a sync all after the second round of printing to stop the early error in the second round of testing, and to use atomic_add statements to perform the changes for the last round of testing. This has worked perfectly.

Due to issues cutting and pasting to this forum, I’ll add the code as another reply.

Knarfnarf

  ! Program Paratest
  ! Written by Frank Meyer
  ! Created Jan 9, 2022
  ! Version 0.2a
  ! Description: Testing some sync issues.
program Paratest
  implicit none

  ! Create co-arrays for testing
  integer :: i_test1[*]

  ! Other variables for testing purposes.
  integer :: i_loop1, i_loop2, i_me, i_all, i_locked
  integer :: i_images(2)
  
  ! Set variable for work
  i_loop1 = 1
  i_loop2 = 2
  i_me = this_image()
  i_all = num_images()
  i_images(1) = i_me

  print *, i_me, " says; ", i_loop1    ! Does printing work?
     
  sync all

  i_loop2 = i_me + 1                   ! Can we access the coarray?
  if (i_loop2 .gt. i_all) then
     i_loop2 = 1
  end if

  i_test1[i_loop2] = i_me              ! Put our number over one.

  sync all

  print *, i_me, " now says;", i_test1 ! Did it get here?

  sync all

  do i_loop1 = 1, i_all                ! Add a dynamic value to it.
     call atomic_add(i_test1[i_loop1], 1) 
  end do                               ! Equiv to i_test1 += i_me

  sync all
  
  print *, i_me, " finally says;", i_test1 ! See the Success (values = i_me-1+16)...

end program Paratest

For another task, it looks like the third method might work, if it does then that is the last key to the puzzle! Interesting that you test the lock on a non-critical variable to prove the ability to reach the critical one… I will have to study this!

Knarfnarf

“Critical protects code, locks protect variables” is what to remember.

There is no “critical variable” or “non-critical variable”.

Locks are more fine-grained and offer more control (you can do other things while waiting for a lock).

But EVENTs is where the performance is and should be the first to try.

I have no experience with Windows/WSL, so I do not know how linux software is installed there. In regular Linux environment, both RHEL-like and Ubuntu/Debian-like, you can add an Intel oneAPI repository to teh system and then try to install just the fortran compiler. All needed extra packages will be automatically installed as dependencies.

Yeah… I’m running this in windows right now and WOW! to get CAF installed was VERY confusing. It looked like I needed the core components (with dependencies) then at least two other products (with different dependencies)… The website LOOKED nice, but was very unclear as to what did or did not have CAF or was compatible to what I was looking for. Your mileage my vary.

Knarfnarf

CAF comes (used to come) with opencoarrays which are used for gfortran. With Intel’s oneAPI ifort you don’t need CAF.
You are right, however, that CAF has become hard to install on Linuxes. Fedora and other RHEL-like distros abandoned opencoarrays quite a while ago. Ubuntu had open-coarrays-bin package until 18.04 but in 20.04 it is not available anymore. Only libcoarrays-* packages left but you have to know how to compile to use them.

As it turns out; I’ve have very good success with caf under WSL (a few issues with the version of Ubuntu that gets installed missing… everything! … and needing to be pointed to a better repository, updated, upgraded, updated, and upgraded AGAIN!). The install of CAF was instantaneous once I started using HomeBrew!

But the issue here is that CAF does NOT push to other images reliably. You CAN pull from another image with good reliability so long as you do so with a type that has a long integer as an iteration counter to compare with the iteration of this image. That way you can be sure that the data being pulled is good.

I have a parrallel sort program that does this and it’s proven to work reliably.

Knarfnarf

But the issue here is that CAF does NOT push to other images reliably.

If I understand you correctly, then see that document on page 26 for coarray get and put operations:
https://doku.lrz.de/download/attachments/43320950/Advanced_Fortran_Coarrays.pdf .

From my own experiences, ‘unreliable’ data transfer through coarray put operations are common if the SPMD coarray runtime (i.e. the runtime of the current coarray team) got ‘destroyed’ by violating the execution segment ordering in that coarray team, i.e. some programming mistake with a synchronization or atomic operation. Any following data transfer or synchronization may or may not succeed, the ‘destroyed’ runtime becomes unreliable.

The only way, that I am aware of, to recover from a ‘destroyed’ coarray runtime is to leave the current coarray team (END TEAM) and then to re-enter (CHANGE TEAM) the coarray team, if this is still possible. Of course this makes no sense if the execution segment ordering is violated again and again, it should not be violated at all.