A Model For Parallel Testing In Fortran

The other day I got some experience testing parallel code, and thought you all might be interested.

2 Likes

I’m happy that somebody’s writing about this.

So, in a nutshell, you do two things:

  1. Synchronize testing so that it’s deterministic
  2. Synchronize test output so that it’s easy to read.

Is that right? If yes, I do the same thing. Indeed, Fortran makes this very convenient to do easily and with few lines of code.

Maybe another important takeaway is that your framework does these two operations automatically? Whereas I’m still coding them by hand in each of my manual tests.

Actually, I don’t really synchronize the testing. I just let each image get its own answer. And I don’t really synchronize the output either. I just make sure multiple images aren’t trying to output at the same time. My framework is designed such that running the tests doesn’t produce output. It collects up the results for output later. That means output only happens in one place, and it’s easy to coordinate.

If you’re testing immediately following a non-blocking operation, how do you ensure the results are deterministic?

Okay, that’s what I call synchronizing the output–ensuring it occurs in a deterministic order.

If you’re testing immediately following a non-blocking operation, how do you ensure the results are deterministic?

If the code you’re testing is non-deterministic or you need synchronization for some other reason, then it’s the test’s responsibility to do the synchronization. But I’m not doing any synchronization or comparison of the results between different images on the framework side. If different images give different answers, as far as the framework is concerned that’s perfectly fine. One image will report one answer, and another image will report a different answer.

The basic idea is as follows:

! some set up code. This will be identical on all images
if (this_image() == 1) then
    ! report what we're testing
end if 

results = tests%run() ! this is the actual line in the framework

critical
    ! report the results and define suite_failed
end critical
sync all ! This is the only synchronization the framework needs to do
if (this_image() == 1) then
    if (any([(suite_failed[i], i = 1, num_images())])) error stop
end if

The above is literally the code from the framework with a few details elided.

Okay, that’s what I call synchronizing the output–ensuring it occurs in a deterministic order.

The outputs actually aren’t in a deterministic order. The critical block ensures that only one image at a time is inside of it, but the images can go through in any order.

Okay, thanks for explaining that. I had a different idea about what you’re doing.