Weird things with Fortran parallel images

I was discussing yesterday with @milancurcic about event post and event_query in a parallel Fortran 2018 application. I have made good progress today although a little problem still remains.

I have pushed a new gtk-fortran example demonstrating how it can be used with parallel computing features of f2008 & f2018:
https://github.com/vmagnin/gtk-fortran-extra/tree/main/parallel_app
Each Fortran image is computing a Buddhabrot and they are regularly summed with co_sum() before being displayed in the GTK window. The parallel computing is running fine with ifort 2021.6.0 and GFortran 11.2.0 + OpenCoarrays 2.10.0 under Ubuntu 22.04.

The problem occurs when I close the GTK window which is managed by image 1. The callback function destroy_signal() (at the top of GUI_and_computation.f90) sends a message to other images:

    if (num_images() >= 2) then
      do i = 2, num_images()
        event post(stop_notification[i], STAT=status)
        print '(A, I3, A, I3)', "Image 1 sending event post stop_notification to", i, " status=", status
      end do
    end if

Toward the bottom of the same file, each image looks if there is a stop message and exits the computation if there is one:

        call event_query(stop_notification, counter)
        print '(A, I3, A, I3)', "I am image", this_image(), " ; event counter", counter
        if (counter /=0) exit computation

It seems to work as this is the end of main.f90:

  print '(A, I3, A)', "I am image", this_image(), " at end program"
  sync all
  print '(A, I3, A)', "I am image", this_image(), " and I am after sync all"

end program Parallel_Buddhabrot

and I have those messages printed for 4 images:

Image 1 sending event post stop_notification to  2 status=  0
I am image  2 ; event counter  1
I am image  2 at end program
Image 1 sending event post stop_notification to  3 status=  0
Image 1 sending event post stop_notification to  4 status=  0
I am image  1 at end program
I am image  1 and I am after sync all
I am image  2 and I am after sync all
 sync all:           4
 sync all:           3
I am image  4 ; event counter  1
I am image  4 at end program
I am image  3 ; event counter  1
I am image  3 at end program
I am image  3 and I am after sync all
I am image  4 and I am after sync all

The problem is that the images are still running, and even burning the CPU… I have tried many things, but the problem remains. I don’t know if it is a Fortran problem or maybe a linking problem with GTK. Anyway it is strange it burns the CPU as the scientific computation is finished and we are at end program.

1 Like

Not sure if it is related to your specific problem as all images do seem to reach END PROGRAM, but I made similar experiences in the past, on my laptop with Linux Ubuntu, when closing a terminal window (not using GTK) if the coarray program did hang or crash (usually due to my faulty coarray programming at the time). While the Ubuntu terminal gave a warning message to shutdown the still running process, it did not really; As you say, the CPU was still burning. The only solution then was to reboot Ubuntu. I can’t tell if this could be a problem specific with Linux? Does Linux not shutdown all of the still running processes?

Meanwhile, I am using different CAF programming techniques (with fault tolerant execution) and with first class CAF compile time analysis (using OOP to implement distributed objects), which does seem to help a lot: CAF programs that don’t hang or crash do not ‘burn’ the CPU after they did finish execution. Of course, not sure if this is related to or could help with your specific problem though; Could it be that some processes are still running and could this be related to the use of GTK?

Random question: does the behavior of the program change, if you move the content of the main program (both declarations and executable statements) into a subroutine?

Happily, I don’t have to reboot the system! Just a CTRL+C in the terminal to stop all the still running process.

It’s possible, maybe the GTK mainloop is not entirely exited? I think I am going to make an experiment with a very simple application: just an empty GTK window, and 4 Fortran images. And I will see if the problem is also present.

No, if I put the main stuff into the my_computation() subroutine it’s the same.

I have fixed the problem by deleting that sync all before co_sum(p, 1) in the computing subroutine:

        sync all
        call co_sum(p, 1)

I thought that all images would had receive the potential stop_notification event before that sync all, but maybe it’s not the case and some image was waiting for the others whereas they had received the signal and stopped their computation.

So it works, but there is another thing I don’t understand. When we close the GTK window, the image 1 is doing that:

      do i = 2, num_images()
        event post(stop_notification[i], STAT=status)
        print '(A, I3, A, I3)', "Image 1 sending event post stop_notification to", i, " status=", status
      end do

but the three printed messages do not appear at the same time in the terminal, there are several seconds passing between the appearing of each of these three messages:

Image 1 sending event post stop_notification to  2 status=  0
I am image  2 ; event counter  1
I am image  2 at end program
I am image  4 ; event counter  0
I am image  4 doing co_sum(p, 1)
Image 1 sending event post stop_notification to  3 status=  0
I am image  3 ; event counter  1
I am image  3 at end program
Image 1 sending event post stop_notification to  4 status=  0
I am image  4 ; event counter  1

Another question: is there any syntax to post an event to all images simultaneously? Or are we obliged to write a loop?

I have finally fixed another little problem: when the computation was naturally finished, the image 1 was entering a GTK main loop (to keep the window opened) and was staying idle, but the three images at end program were burning 100% of their CPU waiting for image 1 to stop! (noisy and not ecological…) So I decided to tell them to go to bed, thanks to the GLib g_usleep() function, until image 1 send them the stop notification:

      ! Creates a GTK main loop to keep the window opened:
      my_gmainloop = g_main_loop_new(c_null_ptr, FALSE)
      call g_main_loop_run(my_gmainloop)
    else
      ! Other images must stay idle waiting for image 1 to stop, else they
      ! will burn 100% of their CPU:
      do
        ! Stay idle for 0.1 s:
        call g_usleep(100000_c_long)
        ! If image 1 closed the GTK window we can exit the loop:
        call event_query(stop_notification, counter)
        if (counter /=0) exit
      end do
    end if
  end if