Implicit real - complex conversion in fortran

I am aware of my imprecise formulation. You’re absolutely right that it’s the implementations of the standard that have become more restrictive, rather than the standard itself.

I am often being corrected on this point. But somehow I subconsciously continue making this mistake. Perhaps because I feel the standard is a Platonic ideal – and its implementations are not as perfect or true as “Ideas”. And since the ideal can never be reached, one should discuss the real things, i.e., existing fortran implementations.

I do not agree with this characterization. Rather, compilers have gotten better at spotting and diagnosing errors that could lead to wrong results. In the past, programmers often took advantage of the typical lack of checking to take shortcuts that usually worked, and then complained to compiler implementers when their shortcut stopped working - either triggering a diagnostic or getting wrong results. I know that in my many years of doing compiler support, I often had to respond to many users telling them that, no, the compiler did not break their code, their code was always broken, and they just hadn’t noticed.

If you have “legacy” code that doesn’t use explicit interfaces and depends on compilers not detecting argument mismatches (or other errors such as array bounds violations), there are always compiler options to remove the blade guards. It’s important that you acknowledge that what you’re relying on is undefined behavior - not just for yourself but for anyone who uses or maintains your code.

To @jkd2022 's question of “what would it break?”, my response is, “what does it enable that you can’t do in a standard-conforming manner?” If we carved out an exception for real/complex, what about integer/real, for people who write generic copy routines? logical/integer? Fortran is a strongly-typed language and it would be, in my opinion, “un-Fortranic” to start down the road of making exceptions. And then, what if the programmer had not intended to mismatch the types?

As for “platonic ideal”, compilers are getting closer to 1) supporting all of the standard, and 2) noticing deviations from the standard. I consider this welcome progress towards an ideal.

2 Likes

To @jkd2022 's question of “what would it break?”, my response is, “what does it enable that you can’t do in a standard-conforming manner?” If we carved out an exception for real/complex, what about integer/real, for people who write generic copy routines? logical/integer? Fortran is a strongly-typed language and it would be, in my opinion, “un-Fortranic” to start down the road of making exceptions. And then, what if the programmer had not intended to mismatch the types?

The only exception would be for real/complex, which is the subject of this thread.

Here is an speed-critical example lifted directly from our electronic-structure code:

do j=1,n
  do i=1,j-1
    c(i,j)=c(i,j)+zdotc(l,a(1,i),1,b(1,j),1)
  end do
  c(j,j)=c(j,j)+ddot(2*l,a(1,j),1,b(1,j),1)
end do

The matrix is Hermitian so the diagonal elements are real, however the compiler cannot know this. Can you rewrite this so it’s just as fast and conforming? This code has worked on multiple compilers for the last 22 years, which makes me think that it wouldn’t do any harm to the standard to grandfather-in real/complex equivalence.

Also, exceptions seem to be permitted in the Fortran standard. Consider the following:

program test
implicit none
real(8) x
real(8) a(10)

a(:)=2.d0

! conforming
call array_square(10,a)

! conforming
call array_square(1,a(1))

! non-conforming
call array_square(1,x)

! conforming
call scalar_square(a(1))

! non-conforming
call scalar_square(a(:))

! conforming
call scalar_square(x)

end program

subroutine array_square(n,a)
implicit none
integer, intent(in) :: n
real(8), intent(inout) :: a(n)
a(:)=a(:)**2
end subroutine

subroutine scalar_square(x)
implicit none
real(8), intent(inout) :: x
x=x**2
end subroutine

GFortran gives:

test.f90:16:20:

   16 | call array_square(1,x)
      |                    1
Error: Rank mismatch in argument ‘a’ at (1) (rank-1 and scalar)
test.f90:22:19:

   22 | call scalar_square(a(:))
      |                   1
Error: Rank mismatch in argument ‘x’ at (1) (scalar and rank-1)

However, a(1) means to me (and I suspect most people) the scalar value of the array a at index 1, in the same way that f is a function and f(x) is the function evaluated at x. Fortran makes an exception for this in that a(1) is either interpreted a scalar or a sub-array depending on the context. Incidentally, Intel Fortran compiles the above code without error.

D:\Projects>ifx /warn:interface test.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.0.2 Build 20231213
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

test.f90(15): error #8284: If the actual argument is scalar, the dummy argument shall be scalar unless the actual argument is of type character or is an element of an array that is not assumed shape, pointer, or polymorphic.   [A]
call array_square(1,x)
-----^
test.f90(21): error #6634: The shape matching rules of actual arguments and dummy arguments have been violated.   [A]
call scalar_square(a(:))
-------------------^
compilation aborted for test.f90 (code 1)

a(1) invokes sequence association. Yes, the standard does have some carve-outs - another is passing a character scalar of any length to a character(1) array dummy argument of an interoperable procedure.

You are proposing to break type matching in a way that is likely to lead to undiagnosed errors. I don’t have the expertise or resources to rewrite that code fragment, but perhaps others can make suggestions.

1 Like

Strange, you managed to disagree with someone who actually agrees with you. :confused:

Maybe, but the benefits would outweigh the drawbacks IMO. I already faced in the past the exact same case as @jkd2022 , and more generally many codes using the Fourier domain try matching the real and complex types at some point.

If one doesn’t want to compromise the type safety at all, there are some solutions that are detailed in the proposal that is linked above.

This is because of history. Before f90, that syntax was the only way that an array slice could be passed as an actual argument. It was, of course, a very useful and much used feature of the language, so now it cannot be removed without introducing backwards compatibility issues.

Regarding the real/complex issue, my feeling is that the language goes to some length to define the storage sequence associations of integer, real, logical, and complex entities. That was originally limited to those varibles within common blocks and/or with equivalence statements. Now that the language has pointers, c_loc(), and c_f_pointer(), it seems like those storage sequence associations should include also these newer features of the language.

When a programmer uses c_loc() and c_f_pointer() to exploit storage sequence associations, he is doing it on purpose, he isn’t going to use this feature unknowingly or by mistake. My preference would be for this usage to be standardized so that the programmer knows exactly the semantics, and he can rely on that usage in a portable way over conforming compilers for an indefinite period of time.

Funny story - I was one of several committee members who wanted to add an example to the standard showing just that, but it got voted down. We know that a LOT of people use these module intrinsics for just this purpose.

I would be interested too to see a convincing example where a non standard conforming subroutine performs better than the respective standard conforming. I admit that your subroutine might perform better if c in a packed matrix form is desired. But in the present form your example is not convincing. You try to beat a single BLAS3 call by double loops (which do not even unroll well) of BLAS1 calls. I cannot test your implementation because of the known problems with ddot, but I strongly suspect that zgemm will win at least in some domain of the n-l parameter space.

subroutine AHB(n, l, A, B, C)
    implicit none
    integer, parameter      :: dp = kind(1.d0)
    integer, intent(in)     :: n, l
    complex(dp), intent(in) :: A(l, n), B(l, n)
    complex(dp), intent(out):: C(n, n)
    complex(dp), parameter  :: z1 = (1.0D0, 0.0D0), z0 = (0.0D0, 0.0D0)

    call zgemm('C', 'N', n, n, l, z1, A, l, B, l, z0, c, n)
  end subroutine AHB

In order to make a fair comparison, one has to sample the whole parameter space.

Yes, and that’s why standardizing this common practice would be welcome.

… But in the present form your example is not convincing. You try to beat a single BLAS3 call by double loops (which do not even unroll well) of BLAS1 calls. I cannot test your implementation because of the known problems with ddot, but I strongly suspect that zgemm will win at least in some domain of the n-l parameter space.

We’ve tried a variety of alternatives but the current implementation is the fastest. However, these things have a tendency to change over the years as compilers and hardware evolve.

Here is the full routine:

subroutine zmctmu(l,n,a,b,ld,c)
use modomp
implicit none
! arguments
integer, intent(in) :: l,n
complex(8), intent(in) :: a(l,n),b(l,n)
integer, intent(in) :: ld
complex(8), intent(inout) :: c(ld,*)
! local variables
integer i,j,nthd
! external functions
real(8), external :: ddot
complex(8), external :: zdotc
call holdthd(n,nthd)
!$OMP PARALLEL DO DEFAULT(SHARED) &
!$OMP PRIVATE(i) &
!$OMP NUM_THREADS(nthd) SCHEDULE(DYNAMIC)
do j=1,n
  do i=1,j-1
    c(i,j)=c(i,j)+zdotc(l,a(1,i),1,b(1,j),1)
  end do
  c(j,j)=c(j,j)+ddot(2*l,a(1,j),1,b(1,j),1)
end do
!$OMP END PARALLEL DO
call freethd(nthd)
end subroutine

… and you can see there is some parallelism involved. The variable l is typically 100-200 and n can be 100s to ~10000, and the entire procedure is replicated hundreds of times. So a lot of CPU power is expended here.

The matrix is then fed into zhegvx which solves the generalised eigenvalue problem Ax = λBx, which turns out to be slightly faster than the packed version zhpgvx, at least with MKL the last time I tested it.

There is also a version of this where the final matrix is real because of inversion symmetry. In this case, the following routine is used instead:

subroutine rzmctmu(l,n,a,b,ld,c)
use modomp
implicit none
! arguments
integer, intent(in) :: l,n
! pass in complex arrays as real
real(8), intent(in) :: a(2*l,n),b(2*l,n)
integer, intent(in) :: ld
complex(8), intent(inout) :: c(ld,*)
! local variables
integer i,j,nthd
call holdthd(n,nthd)
!$OMP PARALLEL DO DEFAULT(SHARED) &
!$OMP PRIVATE(i) &
!$OMP NUM_THREADS(nthd) SCHEDULE(DYNAMIC)
do j=1,n
  do i=1,j
    c(i,j)=c(i,j)+dot_product(a(:,i),b(:,j))
  end do
end do
!$OMP END PARALLEL DO
call freethd(nthd)
end subroutine

Here it is faster to use dot_product instead of ddot. Once again, a complex matrix and vectors are passed in as real and considerable efficiency can be gained.

1 Like

Yes, I can believe that this is a winning approach for MP parallelisation. But even in this case, the total numerical cost is \mathcal{O}(n(n+1)\cdot l), and the win is \mathcal{O}(n\cdot l). According to your numbers the relative speed-up due to the use of standard non-conforming approach is \mathcal{O}(n\cdot l)/\mathcal{O}(n(n+1)\cdot l) \sim \mathcal{O}\big(\frac1n\big)\simeq 0.01\% - 1\%.

I do believe there are many scenarios where the proposal of @PierU would be of advantage. But this should be argued with a strong case. Otherwise, there always will be objections raised by @sblionel, “what does it enable that you can’t do in a standard-conforming manner?”

The win is from complex arithmetic. Given two complex numbers z₁ = a₁ + i b₁ and z₂ = a₂ + i b₂ then the product is

z₁* z₂ = aa₂ +i (ab₂ - ab₁) + bb

If I know a priori that the imaginary part of the result is zero then I can avoid calculating ab₂ - ab₁ , which is what the above routine does by taking real dot products.

In signal processing, when working in the Fourier domain these winning cases are quite common.

Standard conforming version:

    real, allocatable :: r(:,:)
    complex :: c(:,:)
    ! ...
    allocate( r(n,m) )
    ! ... read data to r(:,:)
    ! ... some computations on r(:,:) as real
    allocate( c(n/2+1,m) )
    call rfft(r,c)   ! real-to-complex FFT
    ! ... some computations on c(:,:) as complex
    call irfft(r,c)   ! complex-to-real inverse FFT
    ! ... some computations on r(:,:) as real
    end program

You have to have a real array and a complex, which basically represent the same data. When these arrays have a size of several GB, duplicating them is generally the last thing you want

Non standard conforming version, half the RAM needed:

    real, allocatable, target :: r(:,:)
    complex, pointer :: c(:,:)
    ! ...
    allocate( r(n,m) )
    ! ... read data to r(:,:)
    ! ... some computations on r(:,:) as real
    allocate( c(n/2+1,m) )
    call rfft(r)   ! in place real-to-complex FFT
    call c_f_pointer(c_loc(r),c,shape=[n/2+1,m])
    ! ... some computations on c(:,:) as complex
    call irfft(r)   ! in-place complex-to-real inverse FFT
    ! ... some computations on r(:,:) as real
    end program

But the case shown by @jkd2022 can be strong if the ddot vs zdotc represents the main part of a computation (it’s basically computing the energy of a complex array, and once again if this array is very large you are quite happy if you can save half the number of operations)

2 Likes

Sorry for not going back and re-reading the previous discussions on this topic, but why does a solution like the following not suffice?

$ cat real-complex-assoc.f90     
complex, target :: c(2)
real, pointer :: r(:)

c = [(1., 2.), (3., 4.)]
r => c%re
print *, r
r => c%im
print *, r
end
$ nagfor real-complex-assoc.f90 
$ ./a.out 
   1.0000000   3.0000000
   2.0000000   4.0000000

There are plenty of things that start to go wonky in the standard if we start breaking type-safety in procedure calls.

Nice example. It works also with gfortran (>= v10) and flang-new. With ifx it only works at level -O0; with ifort it crashed with an internal compiler error.

Niklaus Wirth on this topic in A Plea for Lean Software:

Without type checking, the notion of abstraction in programming remains hollow and academic. Abstraction can work only with languages that postulate strict, static typing of every variable and function. In this respect, C fails – it resembles assembler code, where “everything goes”.
[…]
To be worthy of the description, an object-oriented language must embody strict, static typing that cannot be breached, whereby programmers can rely on the compiler to identify inconsistencies.
[…]
The exclusive use of a strongly typed language was the most influential factor in designing this complex system in such a short time. […] Static typing (a) lets the compiler pinpoing inconsistencies before program execution; (b) lets the designer change definitions of structures with less danger of negatice consequences; and (c) speeds up the improvement process, which could include changes that might not otherwise be considered feasible.


Niklaus Wirth in Good Ideas, Through the Looking Glass:

One of the worst features ever is the loophole. This author makes this statement with a certain amount of tongue in cheek and uneasiness, because he infected his languages Pascal, Modula, and even Oberon with this deadly virus.

The loophole lets the programmer breach the type-checking by the compiler. It is a way to say: “Don’t interfere, as I am smarter than the rules”. Loopholes take many forms. The most common are explicit type transfer function, such as

x := LOOPHOLE[i,REAL]      Mesa
x := REAL(i)               Modula
x := SYSTEM.VAL(REAL,i)    Oberon

But they can also be disguised as absolute address specifications, or by variant records in Pascal. In the examples above, the internal representation of integer i is to be interpreted as a floating-point (real) number. This can only be done with knowledge about number representation, which should not be necessary when dealing with the abstraction level provided by the language. […], but the loophole is nevertheless a bad idea.

In Fortran the loopholes would be equivalence or the transfer function.

Whether the complex to real conversion or association presents a loophole is up to debate. I think the way @everythingfunctional has shown is quite clean. Unfortunately, I don’t see how it would help in @PierU’s second case, when the buffer is real.

Your wish has been granted! Fortran 2018 relaxed C_LOC to allow a non-interoperable argument, so the standard lets you do this “casting” trick as long as you’re not doing it for something polymorphic.

So now we know the loophole is legal. :see_no_evil: (C_LOC is an absolute address specification in Wirth’s words.)

I don’t think, that the c_loc() trick is a loophole because of the absolute address. You just create a pointer to an existing object with the target attribute. This is not different, from letting a normal Fortran pointer point to it without the c_loc() workaround. If it is a loophole, it is more for the implicit assumption, that the bit representation of a complex number and of the data of a two element real array are identical, without giving the compiler the chance to let you know, when they are not.