0- or 1-based variables for parallel codes

Hello,

When I make a program using OpenMP or MPI, I often need to prepare variables for representing the current thread or process (e.g., itask or irank). Because OpenMP and MPI were originally developed in C (right?), such values are “naturally” defined as 0-based when using service / query routines. However, Fortran arrays are 1-based, so I sometimes feel it more “natural” to define such variables as 1-based, e.g.,

ithread = omp_get_thread_num() + 1 

to make the array access more straightforward (e.g., A( :, ithread ) rather than A( :, ithread + 1 )). However, such a 1-based definition can cause confusion when using MPI library routines because they expect 0-based arguments. Because my current codes are “mixed” in 0-based and 1-based variables, it sometimes causes confusion (so I need to check the definition when needed).

In such OpenMP or MPI programs, I am wondering which of 0- or 1-based variables do you often prefer to use? Original C-like 0-based or Fortran-style 1-based? Are there possibly some coding practice which to choose? Though I do not think there is unique or “correct” answer, I would appreciate any inputs from your coding experience.

Thanks very much!

I’ve encountered this problem with OpenMP. Now, I’m using the 0-based arguments and I’m using tables with the lower bound equal to 0:

real :: A(0:max_thread-1)
...
ithread = omp_get_thread_num()
A(ithread ) ....
1 Like

This type of usage pattern can be prone to false sharing.


The OpenMP 1.0 specification (see here) was for Fortran, but C and C++ were added a year later.

MPI 1.0 specified bindings for both ANSI C and Fortran 77, but I guess the main ideas of message passing are language-independent. In fact MPI was an effort to merge several existing message passing systems. The rank numbering is specified in MPI 1.0 standard as follows:

Finally, we always identify processes according to their relative rank in a group, that is, consecutive integers in the range 0..groupsize-1.


If you use so-call high-level OpenMP and stick to the provided work-sharing constructs, you rarely need to do thread to work mapping manually.

With low-level OpenMP where you manage the work decomposition yourself, I typically introduce temporary variables for the loop bounds:

n = ...
allocate(dxx(n), y(n))

!$omp parallel default(private) shared(dxx,y,h)

    nt = 1
    it = 1
!$  nt = omp_get_num_threads()
!$  it = omp_get_thread_num()

   ! Divide array into chunks
   nc = n / nt

   ! (omit boundary nodes 1 and n)
   lb = max(it*nc + 1, 2)
   ub = min((it+1)*nc, n - 1)

   alpha = 1.0/h**2
   do i = lb, ub 
       dxx(i) = alpha*(y(i+1) - 2*y(i) + y(i-1))
   end do

!$omp end parallel

Checking if the arrays are 1-based or 0-based is always needed, and this wouldn’t change if the thread numbering began with 1. But I agree this can be confusing at times.

1 Like

A somewhat daring alternative could be the use of coarrays: they are 1-based :slight_smile:

1 Like

Even with MPI I’ve found this to be a false problem. One can have 1-based Fortran arrays passed to a C backend (such as MPI) and the first adresse of the memory will be treated as the 0th index within the C API.

The only situation in which I’ve found myself needing to manually shift the index is if for instance I have a work array + an indexing array and that both need to be passed around such that the C API uses information from the indexing table to acces the work array. In that case yes, either the values of the index array need to be shifted with -1 or one needs to have a 0-based working Fortran array.

1 Like

In fortran, you can of course define the arrays to have any lower bound that is convenient, unlike C and other lessor languages that enforce a single language-wide lower bound.

In these particular cases, I usually use 0-based arrays for MPI and OpenMP, and I use 1-based coarrays. Otherwise, it seems like you are swimming upstream and expending unnecessary effort. Even in other contexts, e.g. when using indexing like mod(irank,nrank), it is convenient to sometimes use 0-based indexing in fortran, so just do it the way the algorithm is most natural. Fortran allows you that choice.

If you put your arrays within a derived type, and pass that derived type as an argument, then the lower bounds are retained through subprogram calls. This way, you don’t need to always declare the dummy array as, for example, array(:,:,0:). If they are module arrays, then they also retain their declared lower bounds globally. This is also true for allocatable dummy arrays, if that situation arises. So there are ways to structure your data so that the language facilitates your lower bound choice.

1 Like