I have encountered a problem with the performance of a program that I do not understand. (Apologies for the length mail). The thing is this:
- I have a bunch of arrays that act as a single array (they are catenated - see an older thread about the Mathematics of Arrays). Via a pointer function I get the pointer to an element of one of the arrays. So far so good.
- The performance of this implementation is quite bad in comparison to a direct accessing of elements of a single array. That is understandable, the program has to do more work than in the case of a plain array.
- But analysing the cause of the bad performance has shown that this code::
FUNCTION get_elem_ndim(view, i) result(elem)
CLASS(moa_view_type), INTENT(INOUT) :: view
INTEGER, DIMENSION(:), INTENT(IN) :: i
INTEGER, POINTER :: elem
logical :: found
CALL get_pointer( view, i, elem, found )
END FUNCTION
takes up to four times as much computation time as:
...
CALL get_pointer( i, elem, found )
...
whereas in both cases the routine get_pointer
does not do more than:
found = .TRUE.
elem => dummy
return
To give a better idea:
- Skipping the call to get_pointer all together: 10-11 seconds
- Using the version without the class(moa_view_type) argument: 19-21 seconds
- Using the version with the “view” argument: 44-46 seconds
I have uploaded the complete source code for convenience. The input file I use is:
# Example of using the moa_measure program
#
# First define the view, then sample the result
#
report-file view1chk.out
use-view # just to be sure
allocate-view 10 10000 # total size of the view
number-repetitions 100000 # repeat any procedure a thousand times
sequential-get 1 10000 # simple sequential access
random-get 10000 # simple random access
sequential-get 10 10000 # simple sequential access with a step of 10
sequential-get 100 10000 # simple sequential access with a step of 10
sequential-get 179 10000 # simple sequential access with a step of 10
sequential-get 357 10000 # simple sequential access with a step of 10
sequential-get 1000 10000 # simple sequential access with a step of 10
The timings I cited were achieved with slight variations of the code in “moa_view_ndim_flat_v6.f90”.
I think I know how I can improve the performance, but like I said, I simply don’t understand why passing the “view” argument should cost so much time, as it is passed through two functions before it arrives at "get_pointer
already.
cmdparse.f90 (9.7 KB)
moa_measure.f90 (8.7 KB)
moa_view_ndim_flat_v6.f90 (10.5 KB)
view_general_v6.f90 (3.2 KB)