Practice with elemental function

This goes back to array-syntax not being very performant in Fortran. I had an example here some time ago showing a simple value function iteration loop-based vs array-syntax and the loop-based version was much faster.

That has been my experience almost universally. The array syntax is a convenience feature, not a performance feature.

2 Likes

Sure, on Apple M4 I get:

$ gfortran --version
GNU Fortran (GCC) 13.2.0
$ gfortran a.f90 -march=native -ffast-math -funroll-loops && ./a.out
 Maximum difference between scalar and vectorized evaluations:    0.0000000000000000
 Maximum difference between scalar and elemental evaluations:    0.0000000000000000
 Time taken for scalar evaluations:        7.3098999999999997E-002
 Time taken for vectorized evaluations:    5.1891999999999994E-002
 Time taken for elemental evaluations:     5.3823000000000010E-002
$ lfortran --version
LFortran version: 0.59.0-196-gcc4738b4e
Platform: macOS ARM
LLVM: 11.1.0
Default target: arm64-apple-darwin24.6.0
$ lfortran a.f90 --fast
Maximum difference between scalar and vectorized evaluations:     0.00000000000000000e+00
Maximum difference between scalar and elemental evaluations:     0.00000000000000000e+00
Time taken for scalar evaluations:         3.53489999999999985e-02
Time taken for vectorized evaluations:     4.23799999999999941e-02
Time taken for elemental evaluations:      3.59570000000000028e-02
$ flang --version
flang version 22.0.0git (https://github.com/llvm/llvm-project.git cb6ee6cb49e4e6b4969f98ba9129b64094387279)
Target: arm64-apple-darwin24.6.0
Thread model: posix
Build config: +unoptimized, +assertions
$ flang -march=native -ffast-math -funroll-loops a.f90 && ./a.out
 Maximum difference between scalar and vectorized evaluations:  0.
 Maximum difference between scalar and elemental evaluations:  0.
 Time taken for scalar evaluations:      .145756
 Time taken for vectorized evaluations:  .195476
 Time taken for elemental evaluations:   8.4893E-02
1 Like

That is great!
Anyway, looks like the current Fortran already has a similar ternary operator, i.e., the MERGE function.

1 Like

I think the ternary operator was introduced because it short-circuits, but the merge function is not required to do so. So cannot safely write a = merge(sqrt(b), 0.0, b >= 0) with b real, but you can write a = (b >= 0 ? sqrt(b) : 0.0).

2 Likes

Exactly, same as my example with log of a negative number.

1 Like

There has been a lot of discussion here on whether the ```merge``` function may short-circuit. The result was not “the merge function is not required to do so.” but ““the merge function is required not to do so.”

2 Likes

Could you please elaborate a little bit more, about why “cannot safely write a = merge(sqrt(b), 0.0, b >= 0) with b real”?
and why it is safe to write a = (b >= 0 ? sqrt(b) : 0.0)?
Thanks!

PS.
I know a topic about merge() is below for example,

1 Like

Well, if a compiler does short circuit it is a bug or at least an extension starting with 2023 Fortran according to my read of some of the following sections, although I was using said extensions early on, particularly with passing an optional argument to MERGE

At least in the 2023 Standard all the arguments are required to be evaluated
per 15.5.3 (Function reference) and 15.5.4 (Subroutine reference).

There were some compilers that treated a MERGE() just like IF/ELSE/ENDIF
and so only evaluated one of the first two arguments of MERGE() but so far
everyone I tested now evaluates both expressions, at least with a simple
test using contained procedures with side effects.

program main
use shortcircuit, only: say_hello
implicit none
integer,save :: countone=0, counttwo=0
integer :: result=0

   call say_hello()
   result=result+merge(one(),two(),.true.)
   result=result+merge(one(),two(),.false.)
   write(*,*)result
   write(*,*)countone,counttwo

contains 

function one()
integer :: one
   one=1
   countone=countone+1
end function one

function two()
integer :: two
   two=2
   counttwo=counttwo+1
end function two

end program main

Output

That is, the standard requires the counts to be 2:

           3
           2           2

The pertinent part of the standard is

 15.5.3    Function reference
 A function is invoked during expression evaluation by a
 function-reference or by a defined operation (10.1.6). When it
 is invoked, all actual argument expressions are evaluated then the
 arguments are associated, and then the function is executed. When
 execution of the function is complete, the value of the function
 result is available for use in the expression that caused the
 function to be invoked. The characteristics of the function result
 (15.3.3) are determined by the interface of the function. If a
 reference to an elemental function (15.9) is an elemental reference,
 all array arguments shall have the same shape.

Notice the “all actual argument expressions are evaluted”**.

Note C.6.1 says the order is up to the compiler, but whether arguments are
evaluated is not (now?) per 15.5.3.

So there is all kinds of related information relating to why you should not pass
optional parameters to MERGE, what happens with unallocated arguments, arrays,
masked arrays, pointers …

Much of it relates to the history of choosing the syntax for test?expr1:expr2,
borrowed from C which is my least favorite C syntax after the overuse of “{}”;
and why MERGE() was not just made a special case that did shortcircuit.

I did not go back to see how far that clear a statement was there, as I remember
thinking it was not defined by the standard and up to the processor; and there
are definitely places where the processor is allowed (but not required) to
short-circuit, as in

  10.1.7      Evaluation of operands
   It is not necessary for a processor to evaluate all of the operands
   of an expression, or to evaluate entirely each operand, if the value
   of the expression can be determined otherwise.

  NOTE1
   This principle is most often applicable to logical expressions,
   zero-sized arrays, and zero-length strings, but it
 applies to all
   expressions.
   For example, in evaluating the expression
            X > Y .OR. L (Z)
   where X, Y, and Z are real and L is a function of type logical, the
   function reference L (Z) need not be evaluated
 if X is greater than
   Y. Similarly, in the array expression
        W (Z) + A
   where A is of size zero and W is a function, the function reference W
   (Z) need not be evaluated.

but a good rule to go by is to assume everything is evaluated
unless you use test?expr1:expr2 or flow control (eg. if-else-elseif-endif, do, …).

1 Like

enclosing the Fortran code in Discourse like so

    ```fortran
        program example
       ! code
        end program example
     ```

will automatically highlight it and retain
the formatting. The functions are both executed
using the compilers I tried. The interesting one
is one that did them in parallel, and the functions
are not thread-safe so a race condition could cause
an error. Notice that after the first call neither
i nor j are negative, they are values from one of
the two function calls!

The standard says that the functions may be executed
in any order and in parallel!

C.6.1  Evaluation of function references (
15    If more than one function reference appears in a statement, they
      can be executed in any order (subject to a
16    function result being evaluated after the evaluation of its
      arguments) and their values cannot depend on the order
17    of execution. This lack of dependence on order of evaluation
      enables parallel execution of the function references.
program test_merge
implicit none
integer :: i, j
   i = (-1)-huge(0)
   j = (-1)-huge(0)
   write(*,*)'J=',j,'I=',i
   i = merge(f1(), f2(), .true.)
   write(*,*)'J=',j,'I=',i
   if (j /= 1) write(*,*)'error stop j/=1'
   if (i /= 10) write(*,*)'error stop i/=10'
   i = merge(f1(), f2(), .false.)
   write(*,*)'J=',j,'I=',i
   if (j /= 2) write(*,*)'error stop j /=2'
   if (i /= 20) write(*,*)'error stop j /=20'
contains

integer function f1() result(r)
   j = 1
   r = 10
end function
integer function f2() result(r)
   j = 2
   r = 20
end function

end program test_merge
 J= -2147483648 I= -2147483648
 J=           2 I=          10
 error stop j/=1
 J=           2 I=          20

so that created the illusion that both functions were not executed given you were testing for the results of F1() or F2() not a result of both of them executing simultaneously. That was interesting, as I was expecting the compiler you used was treating MERGE() like IF-ELSE-ENDIF instead, which I have seen before.

2 Likes
program merge2
  implicit none

  integer :: i

  i = merge (f1 (), f2 (), .true.)
  print *, 'using MERGE, i =', i

  print *
  i = (.true. ? f1 () : f2 ())
  print *, 'using conditional expr, i =', i

contains

  integer function f1 ()
    print *, 'inside f1 ()'
    f1 = 1
  end function

  integer function f2 ()
    print *, 'inside f2 ()'
    f2 = 2
  end function

end program

With a recent build of gfortran:

$ gfortran merge2.f90
$ ./a.out
 inside f1 ()
 inside f2 ()
 using MERGE, i =           1

 inside f1 ()
 using conditional expr, i =           1
$

Note that both functions are called with merge, but only one is called with conditional expressions.

In the case of arithmetic expressions like your sqrt example, the condition may be trying to prevent something like a divide-by-zero or other exception. With merge, on some architectures or IEEE arithmetic settings, this could cause floating point error exceptions to occur.

Another case is with optional arguments. The condition might be testing presence or not. With merge, if the argument is not present, an out of range pointer could be dereferenced causing a SEGV.

3 Likes