The testing environment is Ubuntu 20.04.3 LTS installed on a machine with dual Intel Xeon E5-2699 v4 and Supermicro X10DAi motherboard. I try to compile and test VASP.6.3.0 with recent/latest Intel oneAPI base and hpc toolkits. I found that in order to pass all the tests, the following compiler option must be used:
FFLAGS += -march=core-avx2
As a comparison, the following compiler option will trigger some test errors:
FFLAGS += -xHOST
So I want to know how to correctly determine the -march flag when working with Intel oneAPI on Intel processors, for example, this case, -march=core-avx2?
The following is my test environment hardware and software specific information, for reference only:
You can find the proper codename in the first line of inxi -Cxxx output:
arch: Broadwell
thus, use -march=broadwell
If not available you can always leave the detection of the proper family to the compiler, using -march=native
Interestingly, this native value to the -march option is not mentioned in man ifort, checked in versions 2021.3.0, 2021.4.0
BTW, maybe somebody knows: I have a desktop machine equipped with i5-10505 CPU, codename âComet Lakeâ. There is no -march=cometlake option available. The closest codename seems to be âIcelakeâ but assembler sources generated using native and icelake do differ. So, which codename is appropriate for that CPU?
I mean: The initial letter is lowercase, while inxiâs result is capitalized. Which one should I use?
As for -march=native, I noticed intel has xHost code generation option, which is similar to the function of -march=native. But as we have discussed here, -xHOST doesnât work for some vasp tests.
BTW, the following gcc command confirms that -march=native is equivalent to -march=broadwell on my machine:
$ ifort -march=Broadwell >/dev/null
ifort: command line warning #10148: option '-march=Broadwell' not supported
ifort: command line error: no files specified; for help type "ifort -help"
-xHost wonât work if you run the executable on a system with a smaller instruction set. The VASP page referenced before includes a mention of MPI. If the program ran on a cluster with some nodes that didnât support the compiling nodeâs instruction set, that can cause problems.
If you compile on machine A, and run on machine B,
where A and B uses the same CPU architecture, like both are broadwell, or whatever. Then -xHost should just work fine.
However, if B and A does not uses the same CPU architecture, there is a chance that as Dr. Fortran @sblionel said, if you still use -xHost on the compiling machine A, the code will run on A but may not run on machine B.
In this case, if you insist on compiling on A, you need to make sure use -march=XXX where XXX is for B, or for both A and B or the common denominator.
If you use -march=XXX, then do not use -xHost.
Yes -march=XXX can create code for intelâs CPU with codename XXX.
-xHost basically is for compiling and running on the same âHostâ machine.
You just need to google the codename of your CPU, in your case, search intel ark xeon 2699v4,
so it is broadwell, if want to run on you B which is xeon 2699v4, you need to use
-march=broadwell
If you A and Haswell which is earlier than Boradwell, and you want to run on both A and B, you use -march=haswell.
Yes, it seems there is no one-size-fits-all option, it requires you to know a little bit about the intel CPU codename. Actually Haswell, Broadwell, until coffeelake, and some Skylake there are no big difference, they all use AVX2. so -march=core-avx2 should just work.
If you can try to compile and run on the same machine, then -xHost is the best.
The problem is: for some specific scenarios, -xHost doesnât work, while -march=native does, just as the issue discussed here. Thatâs the main reason I asked this question.
The issue you linked doesnât suggest that -xHost âdoesnât workâ, but rather that it is inappropriate for a mixed execution environment.
What -xHost does (I know, because Iâm the one who implemented it) is query the CPU type. If it is an Intel processor, it applies the appropriate -x option for that processor, and that includes a check at the start of execution that the program is running on an Intel processor supporting the specified instruction set (or larger). If the compiling CPU is non-Intel, then it applies the appropriate -march option and omits the startup check. Even with -march, if you execute on a processor that doesnât support the same instruction set as the compiling processor, the program may fail with an invalid instruction fault.
(Note that âIntel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.â Performance Index (intel.com))
Thank you for your in-depth explanation. By âa mixed execution environmentâ, do you mean situations where execution and compilation processors belong to different architectures?
According to the description here, the tests that trigger vasp error is caused by the following reason:
Hi John, Hi Roger,
I can confirm that this is an issue that is triggered by requesting AVX512 instructions (-xCORE-AVX512 or -xHost on an applicable host), and disappears when limiting things to AVX2.
We have not seen this before because only recently we acquired an AVX512 capable machine (a Cascade Lake Xeon).
I reproduced this with the Intel 19.1.2.254 compilers (which means some 2020 version of Parallel Studio --- confusing).
I have not checked whether it is solved in the new oneAPI distros, but will try to do so ASAP.
Based on the above comment, this issue is attributed to a problem with the current Intel oneAPI compilers & tools.
But no. the comment does not suggest âa problem with the current Intel oneAPI compilers & toolsâ. AVX512 vectorization can cause computations to occur in a different order, and floating-point computations can be sensitive to that. If the VASP test is looking for a specific exact value for some computation, any different instruction sequence, while mathematically still correct, can result in small (or sometimes not so small) changes in the final value.
The simplest thing, I think, is to not use -xHost when running the VASP validation test. You shouldnât be using -xHost in an MPI environment anyway, unless all your nodes have the same processor.
If the VASP validation test is looking for an exact FP value, thatâs its error. If it accepts a small variation, perhaps the range needs to be widened a bit to accommodate more advanced vectorizations.
Thatâs a gfortran option - at least, itâs not documented for ifort. But for gfortran, it behaves much the same as ifortâs -xHost, and I would not recommend that in a mixed-node environment. Instead, verify the instruction set level of all of the execution nodes and use the greatest common set.