I’m trying to implement a work queue using the critical
construct, but am getting results contrary to my expectations. Here’s a minimal working example:
program critbug
integer :: next_task[*]
integer :: my_task
next_task = 0
sync all
do
call get_next_task(my_task)
if (my_task > 10) exit
print *, 'image', this_image(), 'about to do task', my_task
call sleep(3)
print *, 'image', this_image(), 'done task', my_task
end do
contains
subroutine get_next_task(next)
integer,intent(out) :: next
critical
next = next_task[1]
next_task[1] = next + 1
end critical
end subroutine
end program
My expectation was that critical/end critical acts like a mutex, i.e. each image will block while another image is in the critical section, and when the previous image exits the critical section, another image will be able to enter it, etc. This is not what happens in practice, either with the intel compiler or with gfortran/opencoarrays. For the latter, what actually happens is this:
$ cafrun -np 8 ./a.out
image 1 about to do task 0
image 1 done task 0
image 7 about to do task 1
image 8 about to do task 2
image 2 about to do task 3
image 3 about to do task 4
image 1 about to do task 5
image 7 done task 1
image 8 done task 2
image 2 done task 3
image 3 done task 4
image 1 done task 5
image 4 about to do task 6
image 6 about to do task 7
image 8 about to do task 8
image 3 about to do task 9
image 5 about to do task 10
image 4 done task 6
image 6 done task 7
image 8 done task 8
image 3 done task 9
image 5 done task 10
If I put a sync all statement at the end of get_next_task(), then I get the expected behaviour, except the program hangs at the end:
$ cafrun -np 8 ./a.out
image 1 about to do task 0
image 2 about to do task 3
image 4 about to do task 1
image 5 about to do task 4
image 6 about to do task 6
image 7 about to do task 7
image 8 about to do task 2
image 3 about to do task 5
image 2 done task 3
image 4 done task 1
image 5 done task 4
image 6 done task 6
image 7 done task 7
image 8 done task 2
image 3 done task 5
image 1 done task 0
image 1 about to do task 8
image 3 about to do task 9
image 4 about to do task 10
image 1 done task 8
image 3 done task 9
image 4 done task 10
Is this a bug, have I misunderstood what critical
is supposed to do, or both?
Regardless, is there a correct way to implement a work queue in Fortran without resorting to the ISO_C_binding to use POSIX constructs? In my real application, the tasks take different amounts of time to complete, so I do not want to sync all
every time a new task is dealt out.
@rouson, might you have any insight on this one?
Thanks very much, everyone.