Omp_proc_bind / omp_places

Does the CPU support hyper-threading? I assume it only has 2 NUMA domains (1 for each socket)?

I think the general answer is no, if you don’t specify, it is implementation-defined. You can check what occurs in practice with OMP_DISPLAY_AFFINITY=1.

For instance, in the libgomp docs they say threads can be moved between CPUs:

If OMP_PLACES and GOMP_CPU_AFFINITY are unset and OMP_PROC_BIND is either unset or false , threads may be moved between CPUs following no placement policy.

Moreover, the libgomp the docs state:

When undefined, OMP_PROC_BIND defaults to TRUE when OMP_PLACES or GOMP_CPU_AFFINITY is set and FALSE otherwise.

So setting OMP_PLACES=sockets will automatically bind/pin the OpenMP threads when using libgomp (but not necessarily in libiomp).


For libiomp, the thread affinity interface is documented here. The OpenMP variables are documented here.

By default, the thread binding is set to FALSE, which is equivalent to KMP_AFFINITY=none which means:

Does not bind OpenMP threads to particular thread contexts; however, if the operating system supports affinity, the compiler still uses the OpenMP thread affinity interface to determine machine topology.

I don’t understand the implications of the second part of the sentence.


Especially on HPC systems you need to be careful, because sometimes the variables are already set by the admins/modules. I’ve encountered an environment where KMP_ variables where set, and took precedence over OMP_ variables. To use the OpenMP interface, I had to explicitly unset the KMP_AFFINITY variable. If not set correctly, it’s easy to slow down everything by mixing up MPI ranks and OpenMP threads in ways that don’t respect the machine topology or clash at the software level.


Probably not relevant to your question, but on MacOS I’m not sure OpenMP thread affinity is supported at all. The OS kernel passes work to threads as it deems best and the affinity variables are ignored. At least that’s what I’ve inferred from the following threads:

A while ago I did some experiments with XCode Instruments (the native Mac profiler), and one can see the thread switching which takes place.