Dear community:
I would like to hear your recommendation on how to optimize the scenario in where I need to pass a column of a type array element as a parameter to a function while avoiding the creation of temporary arrays. The situation emerges in legacy code that processes business-like data structures from data blobs that I cant modify.
In the original simplified code below, The main TEST program reads an array of complex types, and uses the VSRCH to find the location of a string in one of the string colums of the complex type:
INTEGER FUNCTION VSRCH(STR, LST, L)
IMPLICIT NONE
INTEGER L
CHARACTER(*) STR
CHARACTER(*) LST(L)
INTEGER I
VSRCH = -1
DO I=1,L
IF (LST(I) == STR) THEN
VSRCH = I
EXIT
END IF
END DO
END FUNCTION VSRCH
PROGRAM TEST
IMPLICIT NONE
INTEGER SIZ, LNA, LNB
PARAMETER (LNA = 10)
PARAMETER (LNB = 20)
PARAMETER (SIZ = 100000)
TYPE :: TableA
CHARACTER(LEN=LNA) :: A
CHARACTER(LEN=LNB) :: B
INTEGER D
REAL C
END TYPE
TYPE(TableA), DIMENSION(SIZ) :: list
INTEGER VSRCH
INTEGER IDX,I,N
real T1,T2
CHARACTER(100) :: num
list%A = 'AA'
list(5)%A = 'A'
N=1
IF(COMMAND_ARGUMENT_COUNT().GE.1)THEN
CALL GET_COMMAND_ARGUMENT(1,num)
READ(num,*)N
END IF
call cpu_time(T1)
DO I=1,N
IDX = VSRCH("A", list%A, SIZ)
END DO
call cpu_time(T2)
PRINT *, "Found ", I, SIZ, IDX, T2-T1
END
When compiled with -check all a warning appears telling us that pasing the column list%A to the VSRCH method creates a temporary array:
*forrtl: warning (406): fort: (1): In call to VSRCH, an array temporary was created for argument #2*
This is understandable as the shape of the list array type is not the same as the expected array of strings argument, and as a result the list%A column is a non stride-1 array, so the compiler decides to make a copy of list%A before passing it to the method.
After some research, Arjen Markus suggested several improvements to modernize the old code and use accessor auxiliary functions that circumvected the problem.
Further research indicated that this scenario can be fixed without resorting to auxiliary functions and that it can be fixed by only introducing a minimum set of some of the new language features like MODULE, INTENT and DIMENSION that change the way the arguments and the signature of the function are defined.
The following code is almost the same, but when compiler with -check all it does not produce a warning anymore:
MODULE SRCH
CONTAINS
INTEGER FUNCTION VSRCH(STR, LST, L)
IMPLICIT NONE
INTEGER, INTENT(IN) :: L
CHARACTER(*), INTENT(IN):: STR
CHARACTER(*), DIMENSION(:),INTENT(IN) :: LST
INTEGER :: I
VSRCH = -1
DO I=1,L
IF (LST(I) == STR) THEN
VSRCH = I
EXIT
END IF
END DO
END FUNCTION VSRCH
END MODULE
PROGRAM TEST
USE SRCH
IMPLICIT NONE
INTEGER SIZ, LNA, LNB
PARAMETER (LNA = 10)
PARAMETER (LNB = 20)
PARAMETER (SIZ = 100000)
TYPE :: TableA
CHARACTER(LEN=LNA) :: A
CHARACTER(LEN=LNB) :: B
INTEGER D
REAL C
END TYPE
TYPE(TableA), DIMENSION(SIZ) :: list
INTEGER IDX,I,N
REAL T1,T2
CHARACTER(10) :: num
list%A = 'AA'
list(5)%A = 'A'
N=1
IF(COMMAND_ARGUMENT_COUNT().GE.1)THEN
CALL GET_COMMAND_ARGUMENT(1,num)
READ(num,*)N
END IF
call cpu_time(T1)
DO I=1,N
IDX = VSRCH("A", list%A, SIZ)
END DO
call cpu_time(T2)
PRINT *, "Found ", I, SIZ, IDX, T2-T1
END
As a result, when tested to run consecutively 100k times with an array of 100k entries in an Intel(R) Core™ i5-8365U CPU @ 1.60GHz, 1896 Mhz, 4 Core(s) laptop, the original codes takes 24.8s - 25.3s. The modified code takes 0.0156s - 0s, so this does speeds up the execution:
>ifort solution.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.28.29913.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:solution.exe
-subsystem:console
solution.obj
>problem.exe 100000
Found 100001 100000 5 25.31250
>solution.exe 100000
Found 100001 100000 5 0.0000000E+00
One plausible explanation of why this works is that the use of MODULE, INTENT and DIMENSION give the compiler the hints it needs to be smart about how to access the non stride-1 list%A data structure and avoid the unnecessary creation of temporary arrays and as a result the code is now very efficient.
Another plaussible explanation, however, is that these constructs simply confuses the compiler and as a result it does not produce any warnings anymore, but at runtime the performance is as bad as it used to be.
Any comments in deciding if this solution is really adequate or if I should look for a more proper solution it will be most appreciated.