The testing environment is Ubuntu 20.04.3 LTS installed on a machine with dual Intel Xeon E5-2699 v4 and Supermicro X10DAi motherboard. I try to compile and test VASP.6.3.0 with recent/latest Intel oneAPI base and hpc toolkits. I found that in order to pass all the tests, the following compiler option must be used:
FFLAGS += -march=core-avx2
As a comparison, the following compiler option will trigger some test errors:
FFLAGS += -xHOST
So I want to know how to correctly determine the -march flag when working with Intel oneAPI on Intel processors, for example, this case, -march=core-avx2?
The following is my test environment hardware and software specific information, for reference only:
You can find the proper codename in the first line of inxi -Cxxx output:
thus, use -march=broadwell
If not available you can always leave the detection of the proper family to the compiler, using -march=native
Interestingly, this native value to the -march option is not mentioned in man ifort, checked in versions 2021.3.0, 2021.4.0
BTW, maybe somebody knows: I have a desktop machine equipped with i5-10505 CPU, codename “Comet Lake”. There is no -march=cometlake option available. The closest codename seems to be “Icelake” but assembler sources generated using native and icelake do differ. So, which codename is appropriate for that CPU?
-xHost won’t work if you run the executable on a system with a smaller instruction set. The VASP page referenced before includes a mention of MPI. If the program ran on a cluster with some nodes that didn’t support the compiling node’s instruction set, that can cause problems.
If you compile on machine A, and run on machine B,
where A and B uses the same CPU architecture, like both are broadwell, or whatever. Then -xHost should just work fine.
However, if B and A does not uses the same CPU architecture, there is a chance that as Dr. Fortran @sblionel said, if you still use -xHost on the compiling machine A, the code will run on A but may not run on machine B.
In this case, if you insist on compiling on A, you need to make sure use -march=XXX where XXX is for B, or for both A and B or the common denominator.
so it is broadwell, if want to run on you B which is xeon 2699v4, you need to use
If you A and Haswell which is earlier than Boradwell, and you want to run on both A and B, you use -march=haswell.
Yes, it seems there is no one-size-fits-all option, it requires you to know a little bit about the intel CPU codename. Actually Haswell, Broadwell, until coffeelake, and some Skylake there are no big difference, they all use AVX2. so -march=core-avx2 should just work.
If you can try to compile and run on the same machine, then -xHost is the best.
The issue you linked doesn’t suggest that -xHost “doesn’t work”, but rather that it is inappropriate for a mixed execution environment.
What -xHost does (I know, because I’m the one who implemented it) is query the CPU type. If it is an Intel processor, it applies the appropriate -x option for that processor, and that includes a check at the start of execution that the program is running on an Intel processor supporting the specified instruction set (or larger). If the compiling CPU is non-Intel, then it applies the appropriate -march option and omits the startup check. Even with -march, if you execute on a processor that doesn’t support the same instruction set as the compiling processor, the program may fail with an invalid instruction fault.
(Note that “Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.” Performance Index (intel.com))
Thank you for your in-depth explanation. By “a mixed execution environment”, do you mean situations where execution and compilation processors belong to different architectures?
According to the description here, the tests that trigger vasp error is caused by the following reason:
Hi John, Hi Roger,
I can confirm that this is an issue that is triggered by requesting AVX512 instructions (-xCORE-AVX512 or -xHost on an applicable host), and disappears when limiting things to AVX2.
We have not seen this before because only recently we acquired an AVX512 capable machine (a Cascade Lake Xeon).
I reproduced this with the Intel 220.127.116.11 compilers (which means some 2020 version of Parallel Studio --- confusing).
I have not checked whether it is solved in the new oneAPI distros, but will try to do so ASAP.
Based on the above comment, this issue is attributed to a problem with the current Intel oneAPI compilers & tools.
But no. the comment does not suggest “a problem with the current Intel oneAPI compilers & tools”. AVX512 vectorization can cause computations to occur in a different order, and floating-point computations can be sensitive to that. If the VASP test is looking for a specific exact value for some computation, any different instruction sequence, while mathematically still correct, can result in small (or sometimes not so small) changes in the final value.
The simplest thing, I think, is to not use -xHost when running the VASP validation test. You shouldn’t be using -xHost in an MPI environment anyway, unless all your nodes have the same processor.
If the VASP validation test is looking for an exact FP value, that’s its error. If it accepts a small variation, perhaps the range needs to be widened a bit to accommodate more advanced vectorizations.
That’s a gfortran option - at least, it’s not documented for ifort. But for gfortran, it behaves much the same as ifort’s -xHost, and I would not recommend that in a mixed-node environment. Instead, verify the instruction set level of all of the execution nodes and use the greatest common set.