Slow-down of a program built with gfortran in an Intel oneAPI command window

One other perhaps silly question: do you have a newer processor that has both P-cores (Performance cores) and E-cores (efficiency cores)? If so, perhaps the oneAPI window is defaulting to a E-core or is by default launching processes on e-cores. Just a possibility.

Yes, it does have both types. I should perhaps try to work with the affinity option to get the processes to run on the right one. I have done experiments with that for the threaded version, but there did not seem to be much influence. And I would not expect an 8 to 10-fold slow-down if the process happens to be assigned to the wrong processor type. But it is the most likely reason that I have seen so far.

I just tried it by specifying the affinity to the program’s run (FYI: start /b /wait /affinity 0xAAAAAA program - the string of A’s is what I gathered from some website to be the selection of the performant cores). Result: 100 seconds for the 700x700 matrix.
But the plain run, so no affinity, took 70 seconds in the same Intel oneAPI command window.
The other runs, the ones I was complaining about, took 163 seconds for this size.
I should probably try a wider set of cases, but my eariler observation that the effect is reproducible seems less solid.

Quick check: the time I recorded from a plain window is 15 seconds. So, all the other timings I reported today are far longer.