I’m looking for a way to locally set the number of OpenMP threads for a routine. There are a couple of constraints which do not allow usage of the OMP_NUM_THREADS
environment variable:
- used via a foreign function interface
- caller cannot know whether runtime will be threaded
- OpenMP pragmas are not generally available to the caller
- multiple concurrent instances must not interfere
-
set_omp_num_threads
in global scope will not work
-
- number of threads stored in an opaque handle
- pointer to a non-bind(c) Fortran object
- setter function is available
- deferred usage of thread number when using handle in another procedure
One the C side the simplified setup might look like this:
typedef struct context* context;
extern context new_context(void);
extern void context_set_num_threads(context, int);
extern void actual_calculation(context);
extern void delete_context(context*);
int
main() {
context ctx = new_context();
context_set_num_threads(ctx, 2);
actual_calculation(ctx);
delete_context(&ctx);
return 0;
}
One the Fortran side a minimal example could look like
module bind_c
use, intrinsic :: iso_c_binding
implicit none
! Wrapper to create an opaque handle for configuration
type :: vp_context
integer, allocatable :: nthreads
end type vp_context
contains
! Allocate new context object
function new_context() result(vctx) bind(c)
type(vp_context), pointer :: ctx
type(c_ptr) :: vctx
allocate(ctx)
vctx = c_loc(ctx)
end function new_context
! Set the number of threads in a context object,
! non-positive values will clear the property from the object
subroutine context_set_num_threads(vctx, nthreads) bind(c)
type(c_ptr), value :: vctx
type(vp_context), pointer :: ctx
integer(c_int), value :: nthreads
if (.not.c_associated(vctx)) return
call c_f_pointer(vctx, ctx)
if (allocated(ctx%nthreads)) deallocate(ctx)
if (nthreads > 0) ctx%nthreads = nthreads
end subroutine context_set_num_threads
! Entry point for performing calculation for a given configuration,
! usually takes several objects but simplified here
subroutine actual_calculation(vctx) bind(c)
type(c_ptr), value :: vctx
type(vp_context), pointer :: ctx
integer :: nthreads
if (.not.c_associated(vctx)) return
call c_f_pointer(vctx, ctx)
nthreads = 0
if (allocated(ctx%nthreads)) nthreads = ctx%nthreads
!$omp teams num_teams(1) thread_limit(nthreads)
! call some_library_function(...)
! contains several !$omp parallel do regions
!$omp end teams
end subroutine actual_calculation
! Free allocation of a context handle,
! also resets the handle to not leave a dangling pointer
subroutine delete_context(vctx) bind(c)
type(c_ptr), intent(inout) :: vctx
type(vp_context), pointer :: ctx
if (.not.c_associated(vctx)) return
call c_f_pointer(vctx, ctx)
deallocate(ctx)
vctx = c_null_ptr
end subroutine delete_context
end module bind_c
Most promising so far is the OpenMP 4.0 teams
construct and in some preliminary tests it seems to work as expected. However, for a team of threads the OpenMP specification seems to recommend the usage of distribute
rather than parallel do
, which would require some adaptation and testing in the library first (which is a feasible option).
Is there a better way to locally set the number of threads for a certain scope in OpenMP?