There is an engaging upcoming Intel webinar on Nov 8, with a catchy title about Fortran, OpenMP, and heterogeneous parallel programming—registration is required.
I found this presentation very interesting, although watching live from 4:00am local time was a challenge.
It mainly considered management of data transfers between different / hetrogeneous memory types that can be encountered in OpenMP GPU off-loading of tasks, especially in manageing when they should or should not occur. It discussed OpenMP directives that are becoming available to more efficiently manage these data transfers. Previously, I was not familiar with this definition of “hetrogeneous” memory types when consider data off-loading to GPUs, or with the concept of using multiple GPUs in the same hardware environment !
The presentation also suggested the lack of available controls for “do concurrent”, should any multi-threading be attempted by the compiler. “do concurrent” looks naked in the present Fortran standard.
It certainly made me consider the lack of data transfer management between different memory types that are avalilable for what may be considered “homogeneous” memory implementations of OpenMP, where the management of data transfers between memory, and multiple levels of cache available for each thread. This is a significant inefficiency in my implementations that involve multi-thread use of large memory data sets on dual-channel memory where memory transfer bandwidth often stalls performance. I am not aware of any directive approaches to improving cache efficiency ?
I am envious that this discussed data transfer problem is being addressed for other OpenMP implementations.
Another area where memory transfer inefficiency has been addressed in Modern Fortran is in the use of temporary array sections, so it is not the case that issues of data transfer problems have not been addressed in Fortran implementations. The possibility of non-contiguous memory is a managed issue!
There must be a question of for how long will Fortran be hardware agnostic, especially as this presentation has shown that OpenMP directives can be used to address memory transfer inefficiency.
I would recommend this webinar to those who have not yet viewed it.
I watched the recording on demand. It was informative.
There must be a question of for how long will Fortran be hardware agnostic
That also repeatedly caught my attention: whether the standard committee can achieve it. The success of the past 70 years seems like an excellent track record for its feasibility in the future. Here is the link to the on-damnd recording. Registration is required in the lower left if you have not registered already.
The following Intel Tech Article appears to be related to the webinar: Using Fortran DO CONCURRENT for Accelerator Offload
Yes, it is a summary of the webinar. One of the take aways that I think deserves some discussion is this one:
Unfortunately, ISO Fortran 2018 and forthcoming 2023 don’t provide language constructs to control data movement.
Having the possibility of fine-tuning data movement with OpenMP or OpenACC is very good but it does add some dissonance to learnig heterogeneous computing in Fortran.
So, the question would be, is it feasible/worth it to imagine a robust native Fortran syntax for handling this problem?
If I remember correctly, they said in the webinar that they are working on it.