Fortran Arrays in C++

Beliavsky · November 28, 2024, 12:08pm

I think the codes are in

and

PierU · November 28, 2024, 2:09pm

What is the need of such a convoluted (pun intended ) code to perform something as simple as a convolution? What about the OOP runtime overheads?

Here is my code for 1D convolution:

!*******************************************************************************
subroutine sconv1D &
          (d,dfirst,dlast, &
           e,efirst,elast, &
           x,xfirst,xlast)
!*******************************************************************************
! x = x + d * e
!*******************************************************************************
implicit none

integer, intent(in) :: dfirst, dlast
integer, intent(in) :: efirst, elast
integer, intent(in) :: xfirst, xlast
real,    intent(in) :: d(dfirst:dlast), e(efirst:elast)
real, intent(inout) :: x(xfirst:xlast)

integer :: id, ixmin, ixmax

do id = dfirst, dlast
   ixmin = max( xfirst, efirst+id)
   ixmax = min( xlast,  elast +id)
   x(ixmin:ixmax) = x(ixmin:ixmax) + d(id)*e(ixmin-id:ixmax-id)
end do

end subroutine sconv1D

KISS… no OOP, explicit shape arguments because they have less overheads than assumed shape… The 2D and 3D versions are essentially the same, just with more nested loops.

ivanpribec · November 28, 2024, 2:19pm

Not to mention this (auto-)vectorizes nicely. Say with gfortran -O3 -march=skylake-avx512 -mprefer-vector-width=512, the bulk of the work gets done in the hot loop:

.L5:
        vmovups zmm0, ZMMWORD PTR [r15+rax]
        vfmadd213ps     zmm0, zmm1, ZMMWORD PTR [rdx+rax]
        vmovups ZMMWORD PTR [rdx+rax], zmm0
        add     rax, 64
        cmp     rdi, rax
        jne     .L5

fedebenelli · November 28, 2024, 2:27pm

I’m just getting aware that there is an overhead when using explicit shape instead of assumed shape, I always thought that it did not matter. Does that overhead becomes more noticeable as the arrays get larger?
All my codes are using assumed shape, with pretty small arrays (30x30 is a relatively huge dimension for my cases). But the routines are called millions of times so there might be a room for improvement by using explicit shape? I might do some tests later

certik · November 28, 2024, 3:08pm

Yes, the main advantage of Fortran is that it’s easy for a domain expert (say a physicist) to write fast code.

That being said, Fortran should absolutely give the best performance. I looked at @gronki’s examples, but they are layers and layers of OOP, so it would be a long investigation why the Fortran version is slower, which I don’t have time for right now (you should compare gfortran with the same version of gcc, and clang with the same version of flang, etc.). I recommend writing Fortran code like @PierU did above (or how @jkd2022 recommends). @PierU’s convolution code should look simpler than the correspodning C++ code, and it should be as fast or faster — otherwise the Fortran compilers must improve.

gronki · November 28, 2024, 3:50pm

In my version, kernel is an allocatable, therefore array is guaranteed to be contiguous. Actually, it can be guaranteed by contiguous attribute that I use in my procedural part of the code. Anyway, most compilers nowadays do generate contigous/non-contigous branches. So I disagree that OOP introduces any overhead here while providing much cleaner interface. (From my tests, there was not much difference caused by that, but I should provide such tests for comparison on the repo). Subroutine with 10 arguments feels like 70’s coding, but it must be a matter of preference, since I know many people who hate OOP interfaces! I think the art is to use them in non-critical parts of the code (configuring the computation) and stick to procedural in the critical parts (where we perform the computation).

To be fair, Fortran version is performing faster with gcc/gfortran combo compared to C/C++. Intel optimizations seem to be much superior, but perhaps a bit better for C/C++ compiler. I want to analyze the code with a profiler today and see the curlpit. When analyzing the assembly, both versions seem to be vectorized correctly, but Fortran version for some reason is just a little bit slower.

Anyway, a new thread about “OOP overhead in Fortran” could be a better place for the further discussion, I do not want to derail the topic of FAR++.

PierU · November 28, 2024, 4:42pm

Contiguity and OOP are two orthogonal problems. Explicit shape dummy arrays are also guaranteed to be contiguous. And the compiler has just to pass an address in all cases, whereas with assumed shape the compiler possibly has to create a full descriptor, depending on what is the actual argument.

Yes, explicit shapes requires more arguments, but beyond the preferences it’s really a choice to ensure the least possible overheads in the calls. I do also use assumed shapes whenever performances are not an issue. And although I do not often use OOP, I don’t hate it and sometimes use it. But, frankly, the 70’s coding style requires here just 20 lines of code, is easy to understand, to maintain (assuming there is something to maintain), and to use. What else?

RonShepard · November 28, 2024, 6:33pm

I find that it is the use of derived types that is mostly responsible for reducing the number of subroutine arguments. Closely related variables (scalars and arrays) are grouped together into just a few derived types, and then those derived types are passed as single arguments. Those derived types themselves can, of course, be scalars, assumed shape arrays, or explicit shape arrays. Fortran is very flexible in this regard. The explicit shape vs. assumed shape is a separate issue, the only thing here really is whether the bounds and dimensions are passed as separate arguments or as part of the assumed shape declaration. Unless the call is in a tight loop with varying size arrays, the overhead associated with the array constructor for the argument association is trivial.

PierU · November 28, 2024, 7:04pm

It depends. For instance if such a 1D convolution routine, but with assumed shapes, is called on each column of a 2D array, the compiler may have to generate array descriptors on each call:

do j = 1, size(e,2)
   call sconv1d( f, e(:,i), x(:,i) )
end do

Moreover, passing the lower and upper bounds also makes the routine much more versatile.

Beliavsky · November 28, 2024, 10:22pm

Subroutines with many arguments can be easily usable if only a few of the arguments are required and they appear before the optional arguments. The alternative may be an inflexible subroutine with various parameters hard-coded internally.

gronki · November 28, 2024, 10:51pm

I created a topic to continue this side discussion: Discussion about performance of OOP in Fortran

Topic		Replies	Views
Improving Fortran standardization process (lessons from C++23 getting multidimensional arrays)	42	4007	September 15, 2022
C++ equivalents of Fortran concepts	23	3075	December 29, 2023
C interoperability with assumed-shape arrays	21	1508	August 6, 2023
__FUNCTION__ analogue for Fortran Help	7	1120	August 4, 2022
Linear Algebra standardization in C++	3	1489	January 4, 2024

Fortran Arrays in C++

Related topics