As mentioned earlier, I’m not sure how well-suited the consumer-grade GPUs are for scientific computation with 64-bit floats. Vendors like to introduce stratification into their product lines for various purposes. However, I’ve seen some compelling examples of the Intel Arc series using OpenCL: Intel Arc A750 does all these fluid simulations in real-time
If by long you mean nested and with lots independent data items, then they’d be a good fit. There was an example like this in a recent blog post from Intel: https://www.intel.com/content/www/us/en/developer/articles/technical/phasta-gpu-acceleration.html
This Discourse thread also contains some useful information on the topic of OpenMP offloading.
The Nvidia compilers also support a subset of OpenMP. Hopefully they’ll improve coverage of OpenMP standards in the future.