How to use IFX and offload openMP to GPU?

Dear all,

I noticed in some cases perhaps Julia may be faster as @certik showed below,

I also noticed @mohoree said Intel MKL’s routines can perform like 5x faster than not using MKL,

I also notice @gnikit mentioned MKL too,

So, MKL these three letters keep apprearing in my mind recently. I try to see what Intel Fortran with MKL can do.

While looking at MKL examples, I notice that intel new Fortran Compiler IFX seems can offload openMP at least to Intel’s GPU.
I mean say I have a xeon 2186M and with intel P630 GPU inside the chip, it seems IFX should be able to offload openMP to intel’s GPU,

I use Intel OneAPI 2022.0.3 + visual studio 2019 on windows, and I tried to compile and run the offload examples in Intel OneAPI MKL examples in the examples folder,

C:\Program Files (x86)\Intel\oneAPI\mkl\2022.0.3\examples

The example I am trying the run openMP offload to intel GPU is vsinh.f90 located at,

C:\Program Files (x86)\Intel\oneAPI\mkl\2022.0.3\examples\examples_offload_f\f_offload\vml\source

However, it seems if I enable openMP offload as below,

it just give error at linking stage. However I am not sure if I what I set at linking stage is correct or not, below is what I set,

The error at linking is below,

Build started...
1>------ Build started: Project: MKL_test (IFX), Configuration: Release x64 ------
1>Linking...
1>Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2022.0.0 Build 20211123
1>Copyright (C) 1985-2021 Intel Corporation. All rights reserved.
1>2828013-vsinh.obj : warning LNK4078: multiple '__CLANG_OFFLOAD_BUNDLE__openmp-s' sections found with different attributes (40000800)
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VMLSETMODE_OMP_OFFLOAD_ILP64 referenced in function TEST_FLOAT
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VSSINH_OMP_OFFLOAD_ILP64 referenced in function TEST_FLOAT
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VMSSINH_OMP_OFFLOAD_ILP64 referenced in function TEST_FLOAT
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VSSINHI_OMP_OFFLOAD_ILP64 referenced in function TEST_FLOAT
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VMSSINHI_OMP_OFFLOAD_ILP64 referenced in function TEST_FLOAT
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VDSINH_OMP_OFFLOAD_ILP64 referenced in function TEST_DOUBLE
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VMDSINH_OMP_OFFLOAD_ILP64 referenced in function TEST_DOUBLE
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VDSINHI_OMP_OFFLOAD_ILP64 referenced in function TEST_DOUBLE
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VMDSINHI_OMP_OFFLOAD_ILP64 referenced in function TEST_DOUBLE
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VCSINH_OMP_OFFLOAD_ILP64 referenced in function TEST_FLOAT_COMPLEX
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VMCSINH_OMP_OFFLOAD_ILP64 referenced in function TEST_FLOAT_COMPLEX
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VCSINHI_OMP_OFFLOAD_ILP64 referenced in function TEST_FLOAT_COMPLEX
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VMCSINHI_OMP_OFFLOAD_ILP64 referenced in function TEST_FLOAT_COMPLEX
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VZSINH_OMP_OFFLOAD_ILP64 referenced in function TEST_DOUBLE_COMPLEX
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VMZSINH_OMP_OFFLOAD_ILP64 referenced in function TEST_DOUBLE_COMPLEX
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VZSINHI_OMP_OFFLOAD_ILP64 referenced in function TEST_DOUBLE_COMPLEX
1>2828013-vsinh.obj : error LNK2019: unresolved external symbol MKL_VM_VMZSINHI_OMP_OFFLOAD_ILP64 referenced in function TEST_DOUBLE_COMPLEX
1>x64\Release\MKL_test.exe : fatal error LNK1120: 17 unresolved externals
1>

However if I disable offload like below then the code can compile and link (with some minor warnings) and runs fine.

But I just do not know how to corrected set in Visual Studio or in command line how to let IFX offload to GPU and run the code correctly.

Just curious, does anyone knows how to use IFX and offload openMP to GPU?

Thanks much in advance!

PS.
So as usual, I googled and as usual, I notice Dr. Fortran @sblionel have some comments in 2018 in the below thread,

Now in 2022, I guess IFX can offload to intel GPU, but I just do not know how to set it in VIsual Studio, :sweat_smile: Perhaps Dr. Fortran @sblionel may give some hints about how to offload openMP to intel GPU? Thank Dr. Fortran in advance :laughing:

The example vsinh.f90 is below,

!===============================================================================
! Copyright 2020-2021 Intel Corporation.
!
! This software and the related documents are Intel copyrighted  materials,  and
! your use of  them is  governed by the  express license  under which  they were
! provided to you (License).  Unless the License provides otherwise, you may not
! use, modify, copy, publish, distribute,  disclose or transmit this software or
! the related documents without Intel's prior written permission.
!
! This software and the related documents  are provided as  is,  with no express
! or implied  warranties,  other  than those  that are  expressly stated  in the
! License.
!===============================================================================

!*
!
!*  Content:
!*            Sinh example program text (OpenMP offload interface)
!*
!*******************************************************************************/

include "mkl_omp_offload.f90"
include "_vml_common_functions.f90"

! @brief Real single precision function test begin
integer (kind=4) function test_float(funcname)

    use onemkl_vml_omp_offload
    implicit none
    include "_vml_common_data.f90"
    character (len = *) :: funcname
    real      (kind=4)  :: as_float
    integer   (kind=4)  :: check_result_float
    real      (kind=4),allocatable :: varg1(:), vres1(:), vmres1(:), vref1(:)
    real      (kind=4),allocatable :: vresi1(:), vmresi1(:), vrefi1(:)
    integer   (kind=4) i, a, errs
    integer   (kind=4) VLEN
    parameter (VLEN = 4)
    integer   (kind=4) test_arg1(VLEN)
    integer   (kind=4) test_ref1(VLEN)
    integer   (kind=4) nan_value
    integer   (kind=8) vml_accuracy_mode(3)
    data vml_accuracy_mode / VML_HA, VML_LA, VML_EP /
    integer tmode

    ! NaN value to fill result vector
    data  nan_value /Z'FFFFFFFF'/
    
    ! Arguments and reference results begin
    data test_arg1 / Z'40D9B85C', & ! 6.80375481     
                     Z'C007309A', & ! -2.1123414     
                     Z'40B52EFA', & ! 5.66198444     
                     Z'40BF006A'  / ! 5.96880054     
    data test_ref1 / Z'43E14E52', & ! 450.611877     
                     Z'C0825890', & ! -4.07331085    
                     Z'430FDB98', & ! 143.857788     
                     Z'43438454'  / ! 195.516907     
    ! Arguments and reference results end
    
    errs = 0

    ! Allocate vectors
    allocate(varg1(VLEN))
    allocate(vres1(VLEN))
    allocate(vmres1(VLEN))
    allocate(vref1(VLEN))
    allocate(vresi1(VLEN))
    allocate(vmresi1(VLEN))
    allocate(vrefi1(VLEN))

    ! Fill vectors
    do i = 1, VLEN
        varg1(i) = as_float(test_arg1(i))
        vref1(i) = as_float(test_ref1(i))
        vres1(i) = as_float(nan_value)
        vmres1(i) = as_float(nan_value)

        ! Fill even result values with 777 pads for strided indexing
        if (and(i,1) .eq. 1) then
            vrefi1(i)  = as_float(test_ref1(i))
            vresi1(i)  = 999
            vmresi1(i) = 999
        else
            vrefi1(i)  = 777
            vresi1(i)  = 777
            vmresi1(i) = 777
        end if
    enddo

    ! Loop by three accuracy flavors
    do a = 1, 3
        ! Call VML function with specific accuracy flavor

        !$omp target variant dispatch
        tmode = vmlsetmode(vml_accuracy_mode(a))
        !$omp end target variant dispatch

        !$omp target data map(varg1,vres1)
        !$omp target variant dispatch use_device_ptr(varg1,vres1)
        call vssinh(VLEN, varg1, vres1)
        !$omp end target variant dispatch
        !$omp end target data
		
        !$omp target data map(varg1,vmres1)
        !$omp target variant dispatch use_device_ptr(varg1,vmres1)
        call vmssinh(VLEN, varg1, vmres1, vml_accuracy_mode(a))
        !$omp end target variant dispatch
        !$omp end target data

        !$omp target data map(varg1,vresi1)
        !$omp target variant dispatch use_device_ptr(varg1,vresi1)
        call vssinhi(VLEN/2, varg1, 2, vresi1, 2)
        !$omp end target variant dispatch
        !$omp end target data
		
        !$omp target data map(varg1,vmresi1)
        !$omp target variant dispatch use_device_ptr(varg1,vmresi1)
        call vmssinhi(VLEN/2, varg1, 2, vmresi1, 2, vml_accuracy_mode(a))
        !$omp end target variant dispatch
        !$omp end target data

        ! Check results
        do i = 1, VLEN
          errs = errs + check_result_float(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                           vres1(i), vres1(i), vref1(i), vref1(i), "v"//funcname, a, ",  simple")
          errs = errs + check_result_float(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                           vmres1(i), vmres1(i), vref1(i), vref1(i), "vm"//funcname, a, ",  simple")
          errs = errs + check_result_float(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                           vresi1(i), vresi1(i), vrefi1(i), vrefi1(i), "v"//funcname//"i", a, ", strided")
          errs = errs + check_result_float(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                           vmresi1(i), vmresi1(i), vrefi1(i), vrefi1(i), "vm"//funcname//"i", a, ", strided")
        enddo
    enddo

    test_float = errs

end function
! @brief Real single precision function test end

! @brief Real double precision function test begin
integer (kind=4) function test_double(funcname)

    use onemkl_vml_omp_offload
    implicit none
    include "_vml_common_data.f90"
    character (len = *) :: funcname
    real      (kind=8) :: as_double
    integer   (kind=4) :: check_result_double
    real      (kind=8),allocatable :: varg1(:), vres1(:), vmres1(:), vref1(:)
    real      (kind=8),allocatable :: vresi1(:), vmresi1(:), vrefi1(:)
    integer   (kind=4) i, a, errs
    integer   (kind=8) VLEN
    parameter (VLEN = 4)
    integer   (kind=8) test_arg1(VLEN)
    integer   (kind=8) test_ref1(VLEN)
    integer   (kind=8) nan_value
    integer   (kind=8) vml_accuracy_mode(3)
    data vml_accuracy_mode / VML_HA, VML_LA, VML_EP /
    integer tmode

    ! NaN value to fill result vector
    data  nan_value /Z'FFFFFFFFFFFFFFFF'/
    
    ! Arguments and reference results begin
    data test_arg1 / Z'401B370B60E66E18', & ! 6.80375434309419092      
                     Z'C000E6134801CC26', & ! -2.11234146361813924     
                     Z'4016A5DF421D4BBE', & ! 5.66198447517211711      
                     Z'4017E00D485FC01A'  / ! 5.96880066952146571      
    data test_ref1 / Z'407C29C968C677F1', & ! 450.611672187106763      
                     Z'C0104B1218DE4197', & ! -4.07331122261566403     
                     Z'4061FB72FBB708AE', & ! 143.857786042678924      
                     Z'4068708AA6866883'  / ! 195.516925108448135      
    ! Arguments and reference results end
    
    errs = 0

    ! Allocate vectors
    allocate(varg1(VLEN))
    allocate(vres1(VLEN))
    allocate(vmres1(VLEN))
    allocate(vref1(VLEN))
    allocate(vresi1(VLEN))
    allocate(vmresi1(VLEN))
    allocate(vrefi1(VLEN))

    ! Fill vectors
    do i = 1, VLEN
        varg1(i) = as_double(test_arg1(i))
        vref1(i) = as_double(test_ref1(i))
        vres1(i) = as_double(nan_value)
        vmres1(i) = as_double(nan_value)

        ! Fill even result values with 777 pads for strided indexing
        if (and(i,1) .eq. 1) then
            vrefi1(i)  = as_double(test_ref1(i))
            vresi1(i)  = 999
            vmresi1(i) = 999
        else
            vrefi1(i)  = 777
            vresi1(i)  = 777
            vmresi1(i) = 777
        end if
    enddo

    ! Loop by three accuracy flavors
    do a = 1, 3
        ! Call VML function with specific accuracy flavor

        !$omp target variant dispatch
        tmode = vmlsetmode(vml_accuracy_mode(a))
        !$omp end target variant dispatch

        !$omp target data map(varg1,vres1)
        !$omp target variant dispatch use_device_ptr(varg1,vres1)
        call vdsinh(VLEN, varg1, vres1)
        !$omp end target variant dispatch
        !$omp end target data
		
        !$omp target data map(varg1,vmres1)
        !$omp target variant dispatch use_device_ptr(varg1,vmres1)
        call vmdsinh(VLEN, varg1, vmres1, vml_accuracy_mode(a))
        !$omp end target variant dispatch
        !$omp end target data

        !$omp target data map(varg1,vresi1)
        !$omp target variant dispatch use_device_ptr(varg1,vresi1)
        call vdsinhi(VLEN/2, varg1, 2, vresi1, 2)
        !$omp end target variant dispatch
        !$omp end target data
		
        !$omp target data map(varg1,vmresi1)
        !$omp target variant dispatch use_device_ptr(varg1,vmresi1)
        call vmdsinhi(VLEN/2, varg1, 2, vmresi1, 2, vml_accuracy_mode(a))
        !$omp end target variant dispatch
        !$omp end target data

        ! Check results
        do i = 1, VLEN
          errs = errs + check_result_double(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                            vres1(i), vres1(i), vref1(i), vref1(i), "v"//funcname, a, ",  simple")
          errs = errs + check_result_double(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                            vmres1(i), vmres1(i), vref1(i), vref1(i), "vm"//funcname, a, ",  simple")
          errs = errs + check_result_double(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                            vresi1(i), vresi1(i), vrefi1(i), vrefi1(i), "v"//funcname//"i", a, ", strided")
          errs = errs + check_result_double(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                            vmresi1(i), vmresi1(i), vrefi1(i), vrefi1(i), "vm"//funcname//"i", a, ", strided")
        enddo
    enddo
    
    test_double = errs

end function
! @brief Real double precision function test end

! @brief Complex single precision function test begin
integer (kind=4) function test_float_complex(funcname)

    use onemkl_vml_omp_offload
    implicit none
    include "_vml_common_data.f90"
    character (len = *) :: funcname
    real      (kind=4)  :: as_float
    integer   (kind=4)  :: check_result_float_complex
    complex      (kind=4),allocatable :: varg1(:), vres1(:), vmres1(:), vref1(:)
    complex      (kind=4),allocatable :: vresi1(:), vmresi1(:), vrefi1(:)
    integer   (kind=4) i, a, errs
    integer   (kind=4) VLEN
    parameter (VLEN = 4)
    integer   (kind=4) test_arg1(2*VLEN)
    integer   (kind=4) test_ref1(2*VLEN)
    integer   (kind=4) nan_value
    integer   (kind=8) vml_accuracy_mode(3)
    data vml_accuracy_mode / VML_HA, VML_LA, VML_EP /
    integer tmode

    ! NaN value to fill result vector
    data  nan_value /Z'FFFFFFFF'/
    
    ! Arguments and reference results begin
    data test_arg1 / Z'C007309A', Z'40D9B85C', & ! -2.1123414      + i * 6.80375481     
                     Z'40BF006A', Z'40B52EFA', & ! 5.96880054      + i * 5.66198444     
                     Z'C0C1912F', Z'4103BA28', & ! -6.04897261     + i * 8.2329483      
                     Z'40ABAABC', Z'C052EA36'  / ! 5.3645916       + i * -3.2955451     
    data test_ref1 / Z'C06228DD', Z'400582FC', & ! -3.5337441      + i * 2.08611965     
                     Z'431EFD8F', Z'C2E396E2', & ! 158.990463      + i * -113.794693    
                     Z'429CBE3E', Z'4344CF32', & ! 78.3715668      + i * 196.809357     
                     Z'C2D32BFA', Z'418315A9'  / ! -105.585892     + i * 16.3855762     
    ! Arguments and reference results end
    
    errs = 0

    ! Allocate vectors
    allocate(varg1(VLEN))
    allocate(vres1(VLEN))
    allocate(vmres1(VLEN))
    allocate(vref1(VLEN))
    allocate(vresi1(VLEN))
    allocate(vmresi1(VLEN))
    allocate(vrefi1(VLEN))

    ! Fill vectors
    do i = 1, VLEN
        varg1(i) = CMPLX(as_float(test_arg1(2*i-1)), as_float(test_arg1(2*i)), 4)
        vref1(i) = CMPLX(as_float(test_ref1(2*i-1)), as_float(test_ref1(2*i)), 4)
        vres1(i) = as_float(nan_value)
        vmres1(i) = as_float(nan_value)

        ! Fill even result values with 777 pads for strided indexing
        if (and(i,1) .eq. 1) then
            vrefi1(i)  = CMPLX(as_float(test_ref1(2*i-1)), as_float(test_ref1(2*i)), 4)
            vresi1(i)  = CMPLX(999,999,4)
            vmresi1(i) = CMPLX(999,999,4)
        else
            vrefi1(i)  = CMPLX(777,777,4)
            vresi1(i)  = CMPLX(777,777,4)
            vmresi1(i) = CMPLX(777,777,4)
        end if
    enddo

    ! Loop by three accuracy flavors
    do a = 1, 3
        ! Call VML function with specific accuracy flavor

        !$omp target variant dispatch
        tmode = vmlsetmode(vml_accuracy_mode(a))
        !$omp end target variant dispatch

        !$omp target data map(varg1,vres1)
        !$omp target variant dispatch use_device_ptr(varg1,vres1)
        call vcsinh(VLEN, varg1, vres1)
        !$omp end target variant dispatch
        !$omp end target data
		
        !$omp target data map(varg1,vmres1)
        !$omp target variant dispatch use_device_ptr(varg1,vmres1)
        call vmcsinh(VLEN, varg1, vmres1, vml_accuracy_mode(a))
        !$omp end target variant dispatch
        !$omp end target data

        !$omp target data map(varg1,vresi1)
        !$omp target variant dispatch use_device_ptr(varg1,vresi1)
        call vcsinhi(VLEN/2, varg1, 2, vresi1, 2)
        !$omp end target variant dispatch
        !$omp end target data
		
        !$omp target data map(varg1,vmresi1)
        !$omp target variant dispatch use_device_ptr(varg1,vmresi1)
        call vmcsinhi(VLEN/2, varg1, 2, vmresi1, 2, vml_accuracy_mode(a))
        !$omp end target variant dispatch
        !$omp end target data

        ! Check results
        do i = 1, VLEN
          errs = errs + check_result_float_complex(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                           vres1(i), vres1(i), vref1(i), vref1(i), "v"//funcname, a, ",  simple")
          errs = errs + check_result_float_complex(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                           vmres1(i), vmres1(i), vref1(i), vref1(i), "vm"//funcname, a, ",  simple")
          errs = errs + check_result_float_complex(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                           vresi1(i), vresi1(i), vrefi1(i), vrefi1(i), "v"//funcname//"i", a, ", strided")
          errs = errs + check_result_float_complex(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                           vmresi1(i), vmresi1(i), vrefi1(i), vrefi1(i), "vm"//funcname//"i", a, ", strided")
        enddo
    enddo

    test_float_complex = errs

end function
! @brief Complex single precision function test end

! @brief Complex double precision function test begin
integer (kind=4) function test_double_complex(funcname)

    use onemkl_vml_omp_offload
    implicit none
    include "_vml_common_data.f90"
    character (len = *) :: funcname
    real      (kind=8) :: as_double
    integer   (kind=4) :: check_result_double_complex
    complex   (kind=8),allocatable :: varg1(:), vres1(:), vmres1(:), vref1(:)
    complex   (kind=8),allocatable :: vresi1(:), vmresi1(:), vrefi1(:)
    integer   (kind=4) i, a, errs
    integer   (kind=8) VLEN
    parameter (VLEN = 4)
    integer   (kind=8) test_arg1(2*VLEN)
    integer   (kind=8) test_ref1(2*VLEN)
    integer   (kind=8) nan_value
    integer   (kind=8) vml_accuracy_mode(3)
    data vml_accuracy_mode / VML_HA, VML_LA, VML_EP /
    integer tmode

    ! NaN value to fill result vector
    data  nan_value /Z'FFFFFFFFFFFFFFFF'/
    
    ! Arguments and reference results begin
    data test_arg1 / Z'C000E6134801CC26', Z'401B370B60E66E18', & ! -2.11234146361813924      + i * 6.80375434309419092      
                     Z'4017E00D485FC01A', Z'4016A5DF421D4BBE', & ! 5.96880066952146571       + i * 5.66198447517211711      
                     Z'C0183225E080644C', Z'40207744D998EE8A', & ! -6.04897261413232101      + i * 8.23294715873568705      
                     Z'4015755793FAEAB0', Z'C00A5D46A314BA8E'  / ! 5.36459189623808186       + i * -3.2955448857022196      
    data test_ref1 / Z'C00C451C45E4AF59', Z'4000B05EB8F0615E', & ! -3.533745332757388        + i * 2.08611816867430289      
                     Z'4063DFB206091F94', Z'C05C72DC5A12846B', & ! 158.990481393641517       + i * -113.794699209292659     
                     Z'405397C41F9D3258', Z'406899E6F517D13C', & ! 78.3713454280017459       + i * 196.809443041341979      
                     Z'C05A657FC5D4A250', Z'403062B3F5B1D862'  / ! -105.585923631334708      + i * 16.3855584677879804      
    ! Arguments and reference results end
    
    errs = 0

    ! Allocate vectors
    allocate(varg1(VLEN))
    allocate(vres1(VLEN))
    allocate(vmres1(VLEN))
    allocate(vref1(VLEN))
    allocate(vresi1(VLEN))
    allocate(vmresi1(VLEN))
    allocate(vrefi1(VLEN))

    ! Fill vectors
    do i = 1, VLEN
        varg1(i) = CMPLX(as_double(test_arg1(2*i-1)), as_double(test_arg1(2*i)), 8)
        vref1(i) = CMPLX(as_double(test_ref1(2*i-1)), as_double(test_ref1(2*i)), 8)
        vres1(i) = as_double(nan_value)
        vmres1(i) = as_double(nan_value)

        ! Fill even result values with 777 pads for strided indexing
        if (and(i,1) .eq. 1) then
            vrefi1(i)  = CMPLX(as_double(test_ref1(2*i-1)), as_double(test_ref1(2*i)), 8)
            vresi1(i)  = CMPLX(999,999,8)
            vmresi1(i) = CMPLX(999,999,8)
        else
            vrefi1(i)  = CMPLX(777,777,8)
            vresi1(i)  = CMPLX(777,777,8)
            vmresi1(i) = CMPLX(777,777,8)
        end if
    enddo

    ! Loop by three accuracy flavors
    do a = 1, 3
        ! Call VML function with specific accuracy flavor

        !$omp target variant dispatch
        tmode = vmlsetmode(vml_accuracy_mode(a))
        !$omp end target variant dispatch

        !$omp target data map(varg1,vres1)
        !$omp target variant dispatch use_device_ptr(varg1,vres1)
        call vzsinh(VLEN, varg1, vres1)
        !$omp end target variant dispatch
        !$omp end target data
		
        !$omp target data map(varg1,vmres1)
        !$omp target variant dispatch use_device_ptr(varg1,vmres1)
        call vmzsinh(VLEN, varg1, vmres1, vml_accuracy_mode(a))
        !$omp end target variant dispatch
        !$omp end target data

        !$omp target data map(varg1,vresi1)
        !$omp target variant dispatch use_device_ptr(varg1,vresi1)
        call vzsinhi(VLEN/2, varg1, 2, vresi1, 2)
        !$omp end target variant dispatch
        !$omp end target data
		
        !$omp target data map(varg1,vmresi1)
        !$omp target variant dispatch use_device_ptr(varg1,vmresi1)
        call vmzsinhi(VLEN/2, varg1, 2, vmresi1, 2, vml_accuracy_mode(a))
        !$omp end target variant dispatch
        !$omp end target data

        ! Check results
        do i = 1, VLEN
          errs = errs + check_result_double_complex(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                            vres1(i), vres1(i), vref1(i), vref1(i), "v"//funcname, a, ",  simple")
          errs = errs + check_result_double_complex(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                            vmres1(i), vmres1(i), vref1(i), vref1(i), "vm"//funcname, a, ",  simple")
          errs = errs + check_result_double_complex(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                            vresi1(i), vresi1(i), vrefi1(i), vrefi1(i), "v"//funcname//"i", a, ", strided")
          errs = errs + check_result_double_complex(i, VML_ARG1_RES1, varg1(i), varg1(i), & 
                            vmresi1(i), vmresi1(i), vrefi1(i), vrefi1(i), "vm"//funcname//"i", a, ", strided")
        enddo
    enddo
    
    test_double_complex = errs

end function
! @brief Complex double precision function test end

! @brief Main test program begin
program sinh_example

    use onemkl_vml_omp_offload
    implicit none
    include "_vml_common_data.f90"
    integer   (kind=4) :: blend_int32
    integer   (kind=4) :: test_float
    integer   (kind=4) :: test_float_complex
    integer   (kind=4) :: test_double
    integer   (kind=4) :: test_double_complex
    integer   (kind=4) errs, total_errs
    character (len = *), parameter :: funcname = "sinh"
    
    total_errs = 0

    data FLOAT_MAXULP /FLOAT_MAXULP_HA,FLOAT_MAXULP_LA,FLOAT_MAXULP_EP/
    data COMPLEX_FLOAT_MAXULP /4.0,FLOAT_COMPLEX_MAXULP_LA,FLOAT_COMPLEX_MAXULP_EP/
    data DOUBLE_MAXULP /DOUBLE_MAXULP_HA,DOUBLE_MAXULP_LA,DOUBLE_MAXULP_EP/
    data COMPLEX_DOUBLE_MAXULP /4.0,DOUBLE_COMPLEX_MAXULP_LA,DOUBLE_COMPLEX_MAXULP_EP/

    write (*, 111) funcname
    111 format ('Running ', A, ' functions:')

    ! Single precision test run begin
    write (*, 112) TAB, funcname
    112 format(A, 'Running ',  A, ' with single precision real data type:')
    errs = test_float(funcname)    
    total_errs = total_errs + errs
    write (*, 113) TAB, funcname, TEST_RESULT(blend_int32((errs>0),2,1))
    113 format(A, A, ' single precision real result: ', A)
    ! Single precision test run end

    ! Real double precision test run begin
    write (*, 117) TAB, funcname
    117 format(A, 'Running ',  A, ' with double precision real data type:')
    errs = test_double(funcname)    
    total_errs = total_errs + errs
    write (*, 118) TAB, funcname, TEST_RESULT(blend_int32((errs>0),2,1))
    118 format(A, A, ' double precision real result: ', A)
    ! Real double precision test run end

    ! Single precision complex test run begin
    write (*, 115) TAB, funcname
    115 format(A, 'Running ',  A, ' with single precision complex data type:')
    errs = test_float_complex(funcname)    
    total_errs = total_errs + errs
    write (*, 116) TAB, funcname, TEST_RESULT(blend_int32((errs>0),2,1))
    116 format(A, A, ' single precision complex result: ', A)
    ! Single precision complex test run end

    ! Complex double precision test run begin
    write (*, 119) TAB, funcname
    119 format(A, 'Running ',  A, ' with double precision complex data type:')
    errs = test_double_complex(funcname)    
    total_errs = total_errs + errs
    write (*, 120) TAB, funcname, TEST_RESULT(blend_int32((errs>0),2,1))
    120 format(A, A, ' double precision complex result: ', A)
    ! Complex double precision  test run end

    write (*, 121) funcname, TEST_RESULT(blend_int32((total_errs>0),2,1))
    121 format(A, ' function result: ', A)

end program
! @brief Main test program end