Locally setting number of threads in OpenMP

awvwgk · April 24, 2022, 5:52pm

I’m looking for a way to locally set the number of OpenMP threads for a routine. There are a couple of constraints which do not allow usage of the OMP_NUM_THREADS environment variable:

used via a foreign function interface
- caller cannot know whether runtime will be threaded
- OpenMP pragmas are not generally available to the caller
multiple concurrent instances must not interfere
- set_omp_num_threads in global scope will not work
number of threads stored in an opaque handle
- pointer to a non-bind(c) Fortran object
- setter function is available
- deferred usage of thread number when using handle in another procedure

One the C side the simplified setup might look like this:

typedef struct context* context;
extern context new_context(void);
extern void context_set_num_threads(context, int);
extern void actual_calculation(context);
extern void delete_context(context*);

int
main() {
  context ctx = new_context();
  context_set_num_threads(ctx, 2);

  actual_calculation(ctx);

  delete_context(&ctx);
  return 0;
}

One the Fortran side a minimal example could look like

module bind_c
  use, intrinsic :: iso_c_binding
  implicit none

  ! Wrapper to create an opaque handle for configuration
  type :: vp_context
    integer, allocatable :: nthreads
  end type vp_context

contains

  ! Allocate new context object
  function new_context() result(vctx) bind(c)
    type(vp_context), pointer :: ctx
    type(c_ptr) :: vctx

    allocate(ctx)
    vctx = c_loc(ctx)
  end function new_context

  ! Set the number of threads in a context object,
  ! non-positive values will clear the property from the object
  subroutine context_set_num_threads(vctx, nthreads) bind(c)
    type(c_ptr), value :: vctx
    type(vp_context), pointer :: ctx
    integer(c_int), value :: nthreads

    if (.not.c_associated(vctx)) return
    call c_f_pointer(vctx, ctx)

    if (allocated(ctx%nthreads)) deallocate(ctx)
    if (nthreads > 0) ctx%nthreads = nthreads
  end subroutine context_set_num_threads

  ! Entry point for performing calculation for a given configuration,
  ! usually takes several objects but simplified here
  subroutine actual_calculation(vctx) bind(c)
    type(c_ptr), value :: vctx
    type(vp_context), pointer :: ctx
    integer :: nthreads

    if (.not.c_associated(vctx)) return
    call c_f_pointer(vctx, ctx)

    nthreads = 0
    if (allocated(ctx%nthreads)) nthreads = ctx%nthreads

    !$omp teams num_teams(1) thread_limit(nthreads)
    ! call some_library_function(...)
    ! contains several !$omp parallel do regions
    !$omp end teams
  end subroutine actual_calculation

  ! Free allocation of a context handle,
  ! also resets the handle to not leave a dangling pointer
  subroutine delete_context(vctx) bind(c)
    type(c_ptr), intent(inout) :: vctx
    type(vp_context), pointer :: ctx

    if (.not.c_associated(vctx)) return
    call c_f_pointer(vctx, ctx)
    deallocate(ctx)
    vctx = c_null_ptr
  end subroutine delete_context
end module bind_c

Most promising so far is the OpenMP 4.0 teams construct and in some preliminary tests it seems to work as expected. However, for a team of threads the OpenMP specification seems to recommend the usage of distribute rather than parallel do, which would require some adaptation and testing in the library first (which is a feasible option).

Is there a better way to locally set the number of threads for a certain scope in OpenMP?

urbanjost · April 24, 2022, 7:59pm

https://www.openmp.org/spec-html/5.0/openmpse50.html

It is a bit convoluted, but I generally find a combination of the three methods works. I like the idea you describe with teams, but personally I need a more concrete example of a use case; but I generally run in a homogeneous cluster; with very similiar tasks assigned to each node. Are you working with a problem where very different tasks are assigned to the threads?

awvwgk · April 24, 2022, 8:34pm

I’m looking at a scenario where the consumer side of my library wants to run several instances in the same process with different configurations, like different numbers of threads depending for example on the problem size. In the worst case scenario I have no control over the host process, e.g. could be a service processing different requests delivered by a socket, adjusting environment variables is therefore not an option.

The project is on GitHub if seeing the actual code helps (the project is fpm compatible, but the routines in question are not used in the fpm version), the above example is a simplified version capturing the essential parts. Here are the key routines in the original

the context handle I want to add the thread number setter to:
tblite/context.f90 at main · tblite/tblite · GitHub
get_singlepoint_api is my actual_calculation procedure:
tblite/calculator.f90 at main · tblite/tblite · GitHub
the general API design is described in this presentation:
Using objects across language boundaries — Fortran OOP Seminar Series

Topic		Replies	Views
Basic OpenMP Test Program Not Exiting	9	177	March 21, 2025
Omp_proc_bind / omp_places Help	10	427	November 30, 2024
Use random_number with OpenMP flag Help	10	753	July 4, 2022
Iso_c_binding: interface to a C function is not thread safe for openmp Help	2	661	May 19, 2021
Is !$omp end necessary? Help	5	689	February 15, 2022

Locally setting number of threads in OpenMP

Related topics