There must be a way.
At least for Intel, as their OneAPI name suggested and their firm willingness into GPU business. By the way, I am pretty sure they never give up GPU, and as GPU computation catches more and more attentions considering its potential in doing big matrix operations such as solving some partial differential equations, fluid dynamics, heat transfer, imaging processing (convolution matrix), cellular automata, etc. Those stuff previously are suitable for Xeon Phi (slower but have many threads and high bandwidth memory), now Xeon Phi is basically replaced by GPUs.
In short, I see no reason why Intel Fortran cannot offload openMP or something from Intel CPU to Intel GPU, they have some links as below,
I have a similar thread, but I am not very sure how to really offload to Intel GPU, I think I am almost there but still need one more step to make it work. In fact the examples are all in OneAPI’s example folder, if their instruction on how to offload openMP to GPU a little clear (especially on Windows) it will be great,
I think gfortran or Lfortran can also offload openMP to GPU, if not now, in the near future it should be possible.
But I guess, if the application needs to frequently exchange data from CPU through memory to GPU, then the speedup by GPU will be ruined by the slow memory speed (bandwidth). For example, a typical DDR4 2666 memory’s bandwidth is just about 30 GB/s. GPU’s GDDR5 or something could be 10x faster.
In fact, I suspect the reason that apple’s M1 chip can achieve very high performance, one importance reason is that the memory on M1 chip Mac has quite high bandwidth, like 200 - 400 GB/s if I remember correctly. That is almost the same speed as L2 or L3 cache (but cache should have at least 10x shorter latency than memory). That also explains why M1 Mac is very good image/video processing, as those applications typically will be benefit from high speed memory (images are just some big arrays in the memory).