How to correctly determine the `-march` flag when working with Intel oneAPI on Intel processors?

The testing environment is Ubuntu 20.04.3 LTS installed on a machine with dual Intel Xeon E5-2699 v4 and Supermicro X10DAi motherboard. I try to compile and test VASP.6.3.0 with recent/latest Intel oneAPI base and hpc toolkits. I found that in order to pass all the tests, the following compiler option must be used:

FFLAGS      += -march=core-avx2

As a comparison, the following compiler option will trigger some test errors:

FFLAGS     += -xHOST

So I want to know how to correctly determine the -march flag when working with Intel oneAPI on Intel processors, for example, this case, -march=core-avx2?

The following is my test environment hardware and software specific information, for reference only:

werner@X10DAi-00:~$ inxi -Cxxx
CPU:       Topology: 2x 22-Core model: Intel Xeon E5-2699 v4 bits: 64 type: MT MCP SMP arch: Broadwell rev: 1 
           L2 cache: 110.0 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 387287 
           Speed: 1200 MHz min/max: 1200/3600 MHz Core speeds (MHz): 1: 1200 2: 1202 3: 1202 4: 1202 5: 1200 
           6: 1202 7: 1203 8: 1201 9: 1204 10: 1201 11: 1654 12: 2007 13: 2204 14: 2200 15: 1245 16: 1202 
           17: 1202 18: 1202 19: 1203 20: 1202 21: 1203 22: 1202 23: 1202 24: 1201 25: 1202 26: 1202 27: 1201 
           28: 1202 29: 1202 30: 1202 31: 2066 32: 1202 33: 1202 34: 1202 35: 1203 36: 1202 37: 1202 38: 1202 
           39: 1202 40: 1202 41: 1200 42: 1516 43: 1200 44: 1200 45: 1200 46: 1202 47: 1200 48: 1200 49: 1200 
           50: 1200 51: 1201 52: 1201 53: 1201 54: 1201 55: 1200 56: 1201 57: 1204 58: 1200 59: 1200 60: 1609 
           61: 1871 62: 2200 63: 1251 64: 1201 65: 1201 66: 1201 67: 1200 68: 1203 69: 1200 70: 1201 71: 1201 
           72: 1201 73: 1201 74: 1201 75: 1200 76: 1200 77: 1200 78: 1201 79: 1203 80: 1523 81: 1201 82: 1200 
           83: 1200 84: 1201 85: 1201 86: 1200 87: 1200 88: 1204 
werner@X10DAi-00:~$ inxi -Mxxx
Machine:   Type: Desktop System: Supermicro product: X10DAi v: 123456789 serial: <superuser/root required> 
           Mobo: Supermicro model: X10DAI v: 1.02 serial: <superuser/root required> UEFI: American Megatrends 
           v: 3.2 date: 12/16/2019 
werner@X10DAi-00:~$ inxi -Sxxx
System:    Host: X10DAi-00 Kernel: 5.8.0-43-generic x86_64 bits: 64 compiler: N/A Desktop: GNOME 3.36.9 
           tk: GTK 3.24.20 wm: gnome-shell dm: GDM3 3.36.3 Distro: Ubuntu 20.04.3 LTS (Focal Fossa) 

See here for related discussions.

Regards,
HZ

You can find the proper codename in the first line of inxi -Cxxx output:

arch: Broadwell

thus, use -march=broadwell
If not available you can always leave the detection of the proper family to the compiler, using -march=native
Interestingly, this native value to the -march option is not mentioned in man ifort, checked in versions 2021.3.0, 2021.4.0

BTW, maybe somebody knows: I have a desktop machine equipped with i5-10505 CPU, codename “Comet Lake”. There is no -march=cometlake option available. The closest codename seems to be “Icelake” but assembler sources generated using native and icelake do differ. So, which codename is appropriate for that CPU?

The march document has the following processor name:

broadwell

I mean: The initial letter is lowercase, while inxi’s result is capitalized. Which one should I use?

As for -march=native, I noticed intel has xHost code generation option, which is similar to the function of -march=native. But as we have discussed here, -xHOST doesn’t work for some vasp tests.

BTW, the following gcc command confirms that -march=native is equivalent to -march=broadwell on my machine:

$ gcc -march=native -Q --help=target|grep -- '^[ ]*-march='
  -march=                     		broadwell

Regards,
HZ

Use lowercase
Easy to check:

ifort -march=Broadwell source.f90
ifort: command line warning #10148: option ‘-march=Broadwell’ not supported

Great. The following also do the trick:

$ ifort -march=Broadwell >/dev/null 
ifort: command line warning #10148: option '-march=Broadwell' not supported
ifort: command line error: no files specified; for help type "ifort -help"

Wonderful advice. Based on your above comment, I’ve confirmed that both of the following settings solve the problem discussed here:

FFLAGS += -march=broadwell

or

FFLAGS += -march=native

Regards,
HZ

-xHost won’t work if you run the executable on a system with a smaller instruction set. The VASP page referenced before includes a mention of MPI. If the program ran on a cluster with some nodes that didn’t support the compiling node’s instruction set, that can cause problems.

2 Likes

@sblionel

  1. What do you mean by saying “a smaller instruction set”?
  2. Do you mean -march=native is more portable than -xHost?

Regards,
HZ

If you compile on machine A, and run on machine B,
where A and B uses the same CPU architecture, like both are broadwell, or whatever. Then -xHost should just work fine.

However, if B and A does not uses the same CPU architecture, there is a chance that as Dr. Fortran @sblionel said, if you still use -xHost on the compiling machine A, the code will run on A but may not run on machine B.
In this case, if you insist on compiling on A, you need to make sure use -march=XXX where XXX is for B, or for both A and B or the common denominator.

  1. Do you mean use -march solely or in combination with -xHost?
  2. IMO, it’s not so convenient, even difficult, to find the acceptable instruction sets or common definitions for A and B.
  3. Do you mean by this way, -march can facilitate the cross-compiling?

Regards,
HZ

If you use -march=XXX, then do not use -xHost.
Yes -march=XXX can create code for intel’s CPU with codename XXX.
-xHost basically is for compiling and running on the same ‘Host’ machine.

You just need to google the codename of your CPU, in your case, search intel ark xeon 2699v4,


so it is broadwell, if want to run on you B which is xeon 2699v4, you need to use
-march=broadwell
If you A and Haswell which is earlier than Boradwell, and you want to run on both A and B, you use -march=haswell.

Yes, it seems there is no one-size-fits-all option, it requires you to know a little bit about the intel CPU codename. Actually Haswell, Broadwell, until coffeelake, and some Skylake there are no big difference, they all use AVX2. so -march=core-avx2 should just work.
If you can try to compile and run on the same machine, then -xHost is the best.

As commented here, the most convenient methods are still the following ones:

$ gcc -march=native -Q --help=target|grep -- '^[ ]*-march='
  -march=                     		broadwell

$ cat /sys/devices/cpu/caps/pmu_name
broadwell

The problem is: for some specific scenarios, -xHost doesn’t work, while -march=native does, just as the issue discussed here. That’s the main reason I asked this question.

Regards,
HZ

The issue you linked doesn’t suggest that -xHost “doesn’t work”, but rather that it is inappropriate for a mixed execution environment.

What -xHost does (I know, because I’m the one who implemented it) is query the CPU type. If it is an Intel processor, it applies the appropriate -x option for that processor, and that includes a check at the start of execution that the program is running on an Intel processor supporting the specified instruction set (or larger). If the compiling CPU is non-Intel, then it applies the appropriate -march option and omits the startup check. Even with -march, if you execute on a processor that doesn’t support the same instruction set as the compiling processor, the program may fail with an invalid instruction fault.

(Note that “Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.” Performance Index (intel.com))

1 Like

Thank you for your in-depth explanation. By “a mixed execution environment”, do you mean situations where execution and compilation processors belong to different architectures?

According to the description here, the tests that trigger vasp error is caused by the following reason:

Hi John, Hi Roger,

I can confirm that this is an issue that is triggered by requesting AVX512 instructions (-xCORE-AVX512 or -xHost on an applicable host), and disappears when limiting things to AVX2.
We have not seen this before because only recently we acquired an AVX512 capable machine (a Cascade Lake Xeon).
I reproduced this with the Intel 19.1.2.254 compilers (which means some 2020 version of Parallel Studio --- confusing).
I have not checked whether it is solved in the new oneAPI distros, but will try to do so ASAP.

Based on the above comment, this issue is attributed to a problem with the current Intel oneAPI compilers & tools.

Regards,
HZ

Yes, exactly.

But no. the comment does not suggest “a problem with the current Intel oneAPI compilers & tools”. AVX512 vectorization can cause computations to occur in a different order, and floating-point computations can be sensitive to that. If the VASP test is looking for a specific exact value for some computation, any different instruction sequence, while mathematically still correct, can result in small (or sometimes not so small) changes in the final value.

I suggest you take a look at Improving Numerical Reproducibility in C/C++/Fortran (supercomputing.org)

1 Like

Thank you for your detailed explanation.

Yes. This is exactly what happened in VASP validation test.

I looked it over, but can’t find a recommended compilation option from your lecture above to deal with the VASP tests discussed here.

Regards,
HZ

The simplest thing, I think, is to not use -xHost when running the VASP validation test. You shouldn’t be using -xHost in an MPI environment anyway, unless all your nodes have the same processor.

If the VASP validation test is looking for an exact FP value, that’s its error. If it accepts a small variation, perhaps the range needs to be widened a bit to accommodate more advanced vectorizations.

What about -march=native then?

I’m not so sure either, so I have raised this question on the VASP forum.

Regards,
HZ

That’s a gfortran option - at least, it’s not documented for ifort. But for gfortran, it behaves much the same as ifort’s -xHost, and I would not recommend that in a mixed-node environment. Instead, verify the instruction set level of all of the execution nodes and use the greatest common set.

2 Likes

Got it. Thank you very much.

HZ