I have encountered a problem with the performance of a program that I do not understand. (Apologies for the length mail). The thing is this:
- I have a bunch of arrays that act as a single array (they are catenated - see an older thread about the Mathematics of Arrays). Via a pointer function I get the pointer to an element of one of the arrays. So far so good.
- The performance of this implementation is quite bad in comparison to a direct accessing of elements of a single array. That is understandable, the program has to do more work than in the case of a plain array.
- But analysing the cause of the bad performance has shown that this code::
FUNCTION get_elem_ndim(view, i) result(elem)
    CLASS(moa_view_type), INTENT(INOUT) :: view
    INTEGER, DIMENSION(:), INTENT(IN)   :: i
    INTEGER, POINTER                    :: elem
    logical                             :: found
    CALL get_pointer( view, i, elem, found )
END FUNCTION
takes up to four times as much computation time as:
    ...
    CALL get_pointer( i, elem, found )
    ...
whereas in both cases the routine get_pointer does not do more than:
    found = .TRUE.
    elem => dummy
    return
To give a better idea:
- Skipping the call to get_pointer all together: 10-11 seconds
- Using the version without the class(moa_view_type) argument: 19-21 seconds
- Using the version with the “view” argument: 44-46 seconds
I have uploaded the complete source code for convenience. The input file I use is:
# Example of using the moa_measure program
#
# First define the view, then sample the result
#
report-file view1chk.out
use-view                   # just to be sure
allocate-view 10 10000     # total size of the view
number-repetitions 100000  # repeat any procedure a thousand times
sequential-get 1 10000     # simple sequential access
random-get 10000           # simple random access
sequential-get 10 10000    # simple sequential access with a step of 10
sequential-get 100 10000    # simple sequential access with a step of 10
sequential-get 179  10000    # simple sequential access with a step of 10
sequential-get 357  10000    # simple sequential access with a step of 10
sequential-get 1000 10000    # simple sequential access with a step of 10
The timings I cited were achieved with slight variations of the code in “moa_view_ndim_flat_v6.f90”.
I think I know how I can improve the performance, but like I said, I simply don’t understand why passing the “view” argument should cost so much time, as it is passed through two functions before it arrives at "get_pointer already.
cmdparse.f90 (9.7 KB)
moa_measure.f90 (8.7 KB)
moa_view_ndim_flat_v6.f90 (10.5 KB)
view_general_v6.f90 (3.2 KB)
 . In the meantime I did try some radical alternatives and nothing helped. Preparing a version like you suggest will take me some time though. Which compiler did you use by the way?
 . In the meantime I did try some radical alternatives and nothing helped. Preparing a version like you suggest will take me some time though. Which compiler did you use by the way?