Branchless conditional arithmetic oddities

Has anybody else stumbled upon the following oddity? :

I was trying out a kind of branchless arithmetic operation such as
A = B*( d >= e ) + C*( f<=g ) !> this is something you can do in python and wanted to try out

With intel and nvfortran this is actually allowed BUT with gfortran such statement would produce a compile error such as ‘Error: Operands of binary numeric operator ‘*’ at (1) are INTEGER(4)/LOGICAL(4)’

Ok, be it, lets find a work around! —> transfer !
A = B* transfer( d >= e ,1 ) + C * transfer( f<=g , 1)
And then, one finds that
ifort : transfer(.true.,1) = -1
nvfortran : transfer(.true.,1) = -1
gfortran : transfer(.true.,1) = 1

My oh my …

Does any body knows if on the gfortran side there is any plans to supporting this kind of statements : A = B*( d >= e ) + C*( f<=g ) ?

And, well, does someone know, how to justify that “-1” ?? this was really shocking.

Well, if you seek to attempt something like this (a case of rather "clever’ programming toward branchless arithmetic in your code but let the compiler jump through hoops under the hood) in a standard manner, then use a Fortran 2023 processor:

   .
   integer, parameter :: ITRUE = xx ! -1 or 1, whatever is suitable
   integer, parameter :: IFALSE = -ITRUE
   . 
   A = B*(( d >= e ? ITRUE : IFALSE )) + C*(( f <= g ? ITRUE : IFALSE ))
   .
2 Likes

It’s not often I get to cite the same article two posts in a row. Doctor Fortran in “To .EQV. or to .NEQV., that is the question”, or “It’s only LOGICAL” - Doctor Fortran (stevelionel.com)

2 Likes

I can tell you where ifort’s -1 comes from. Set your wayback machine to 1977, when the DEC VAX was introduced (shipped in 1978). The VAX instruction set had Branch-on-low-bit-set (or clear) instructions, and VAX compilers used this to do true/false tests. Low bit set was true, so .TRUE. was all bits set, or -1. .FALSE. was all bits clear, zero. The VMS operating system also defined a system-wide convention for status values, reserving the low three bits for severity. Success was 1, informational 3. Warning was 0, error 2 and severe 4. (other severities were not defined.) So, when you called a system service or library routine that returned a status, if all you cared about was success/failure you could do a low-bit test on the result.

In addition, as I mention in my ancient blog post, VAX FORTRAN allowed free conversion between LOGICAL and INTEGER, so you were allowed to test an integer return status as if it were logical. This extension persists today (ifort’s heritage is from DEC), though some of the weirder aspects of it (allowing mixing in list-directed input for example) are no longer allowed by default.

C and its zero/non-zero convention came later. You can get ifort to use that with an option. But the bottom line is to never depend on the numeric representation of a logical value, as the standard doesn’t define that. (If C_BOOL in ISO_C_BINDING is not -1, it is an integer kind that interoperates with C bool. That’s sort of a weak prescription, but ifort will also use the C style if you say -standard-semantics.)

3 Likes

Thanks for the tip, nonetheless for the time being I can’t jump to such recent norms because I have to restrain myself to developments that are at most 2008 compatible in term of norms. ( I’m already looked at weirdly by colleagues because I started testing oneAPI23 when many are just now starting to use Intel2019, for many reasons including engagements with released software + the painful experience of installing new versions of the compiler… won’t keep on this because this is yet another touchy subject )

@sblionel Thank you for the article and the historical note! I will use the occasion to also thank you because your posts here and in the Intel Fortran Forum have been extremally useful and valuable to me over the years!

So, I posted this because it was once more, one of those occasions in which I got frustrated with the language I like the most and that I have being trying to defend for years… I actually stumble upon this oddity like 5 or 6 years back but just rolled with my few-liner IFs… then, just last week I was looking at some code that wanted to refactor and for pure fun ended up trying out an approach for character conversion between upper and lower cases (for the curious Faster to_lower/to_upper implementations · Issue #703 · fortran-lang/stdlib · GitHub ). So together with @urbanjost we found one method that seems quite sound, robust and fast (in the issue you will see it tagged as “upper7”). Then a colleague of mine who is a C/C++ wizard got into the challenge and told me “I found a way of improving your method but in C with LTO using a branchless approach” … when he showed me the code I told him, “hold your horses, I can do that in Fortran” because I remembered doing that, but forgotten the pain of nonstandard semantics and incompatibility between compilers …

So yes, it is a bit frustrating to not only having to remember that there are .and. iand, .eq., .eqv., etc etc, but that they might just not behave as expected… or they might… I mean, the one reason I like Fortran is because for doing science and number crunching it is just extremally elegant, yet every now and then I come into this kind of things that just get stuck for historical reasons … just like the sign convention of the electron charge because B. Franklin felt like the flow of the “electric-fluid” would be in such direction, by the time science advanced and found out that the flow of mass was actually in the opposite direction and that “e” might just be positive, well it was too late, it had already paved the whole of physics world wide, but hey, maxwell equations are invariant to it… so just roll with it … but here, there is an actual standard, most people are taught that (0=Off=False) and (1=On=True) … for this kind of things I often get negative comments on Fortran.

Coming back to the expression,

“( d >= e )” here I’m comparing two integer(1) values, so in and ideal world I would have just expected to obtain a 1-byte 0 or 1, to multiply with B which is also integer(1). Looking at the intrinsics I found no way of actually achieving this in a standard/cross-compiler compatible way :frowning: . Well, yes, transfer + ‘-standard-semantics’ would give the same result with gfortran/ifort/ifx, but ifort’s transfer implementation is rather slow, ifx behaves pretty well. With nvfortran I haven’t found an equivalent of ‘-standard-semantics’

Hello @hkvzjal,

I’ve seen the approach of replacing logical conditions with integer or real flags a number of times, I remember a colleague in HPC telling me that numbered constants are much faster, especially as long as fma instructions are available.

Since you’re using integers, one way you could achieve what you’re trying to is:

A = B*max(0_1,sign(1_1,d-e)) + C*max(0_1,sign(1_1,g-f))

Now of course this looks ugly, but you could wrap it with an elemental procedure:

! Return 1 if a>=b or 0 otherwise
elemental integer(1) function where_ge(a,b)
   integer(1), intent(in) :: a,b
   where_ge = max(0_1,sign(1_1,a-b))
end function where_ge

and do

A = B*where_ge(d,e) + C*where_ge(g,f)

Im not sure it would be faster than a simple merge, though, it’s even easier to read and works with arrays:

A = merge(B,0_1,d>=e) + merge(C,0_1,f<=g)
3 Likes

Thank you @FedericoPerini this might just work! I will try that out :slight_smile:

Yes, this is the whole reason I’m using this approach :slight_smile: … If done properly it is possible to squeez-out some extra performance (not revolutionary but good enough)

1 Like

merge(0,1,LOGICAL_EXPRESSION) and merge(1,0,LOGICAL_EXPRESSION) would be standard; I have used that a few times.

   A=B*merge(0,1,d>=e)+C*merge(0,1,f<=g)

and (if I remember correctly) MERGE() is f95. A bit verbose.It has the advantage given the current state of affairs of relatively clearly indicating what value is being used for TRUE and FALSE and allowing easily for a particular kind to be used if you wanted like “zero” and “one”

    integer(kind=int8), parameter :: zero=0_int8, one=1_int8

I have used that several times as a Q&D way to make code work across compilers that originally was specific to a particular compiler that used 0 and 1 or 0 and -1 or whatever.

I have seen some unusual things using those extensions but never jotted them down; but it took a while to decipher a few lines that were using logicals to determine if values were odd or even and one one compiler there were expressions that used the sign of TRUE and FALSE being different and so on. I found MERGE the most useful for replacing that with a standard approach, personally.

A little function with a short mnemonic name is not bad

   program testit
   implicit none
   integer :: a, b, c, d, e, f,g
   integer, parameter :: one = 1, zero = 0

   b = 3; c = -2; d = -1; e = 1; f = 40; g = 50
   A = B*merge(0, 1, d >= e) + C*merge(0, 1, f <= g)
   write (*, *) A
   A = B*oz(d >= e) + C*oz(f <= g)
   write (*, *) A
contains
   integer function oz(expression) 
   ! logical to integer expression
   ! return one or zero
      logical, intent(in) :: expression
      oz = merge(0, 1, expression)
   end function oz

end program testit

xxx

2 Likes

That was brilliant @urbanjost & @FedericoPerini !!! I could not resist to tested it out:

pure integer(1) function oz(expression) 
      logical, intent(in) :: expression
      oz = merge(1_1, 0_1, expression) ! had to change the order here
end function oz
pure function upper14(str) result(string)
      character(*), intent(in) :: str
      character(len(str))      :: string

      integer(kind=int8), parameter :: ade_a = ichar('a'), ade_z = ichar('z')
      integer(kind=int8), parameter :: case_diff = ichar('a') - ichar('A')
      integer(kind=int8)            :: ade_char
      integer                       :: i
      do i = 1,len(str)
         ade_char = ichar(str(i:i))
         ade_char = ade_char - case_diff * oz( ade_char >= ade_a .and. ade_char <= ade_z )
         string(i:i) = char(ade_char)
      enddo
   end function upper14

And got the following 3 runs using the same benchmark from the github

|              |         | time[s] 10 million tests                                        |
|--------------|---------|---------------------|---------------------|---------------------|
| ifx23        | upper7  | .1250000000000000   | .1250000000000000   | .1250000000000000   |
|              | upper14 | .1093750000000000   | .1093750000000000   | .1093750000000000   |
| gfortran12.2 | upper7  | 0.10441499999999948 | 0.10538500000000539 | 0.10942299999999960 |
|              | upper14 | 0.10244900000000001 | 0.10497800000000268 | 0.10555499999999540 |
| ifort23      | upper7  | .1406250000000000   | .1562500000000000   | .1406250000000000   |
|              | upper14 | .1562500000000000   | .1718750000000000   | .1562500000000000   |
| nvfortran23  | upper7  | 0.7453429698944092  | 0.7583339214324951  | 0.7654871940612793  |
|              | upper14 | 1.220628023147583   | 1.221688032150269   | 1.228378057479858   |

Interesting enough it did manage to squeez some cpu time for ifx and gfortran, ifort gets like 6%slower … with elemental it was just slightly slower for all.

Another interesting thing, if instead of ‘case_diff * oz( ade_char >= ade_a .and. ade_char <= ade_z )’ one writes ‘case_diff * transfer( ade_char >= ade_a .and. ade_char <= ade_z , 1 )’ it is also just slightly slower than the pure “if”, with ‘case_diff * transfer( ade_char >= ade_a .and. ade_char <= ade_z , 1_1 )’ it becomes like 4x slower

Does it make any timing difference if the oz() function is internal, or if it is manually inlined?

Did try that out, saw no difference. I think that in O3 the compiler is smart enough to inline such simple functions, but in O0 I would indeed see a difference.

@urbanjost remembered incorrectly. MERGE was f90. It was labelled as an array construction function in the f90 standard 13.8.6 and C.13.1.6.4. It was and is elemental, and so it is OK with three scalar arguments, as in @urbanjost’s example.

Thanks @sblionel for the historical note, I hit that with Intel (and other) compilers in the past when doing C interoperability, so I created this FAQ entry long time ago: Gotchas — Fortran90 1.0 documentation. I was wondering what the reason is, your note explains it.

1 Like

Does adding those switches to the defaults for fpm seem worth discussing?

You may want to check with @greenrongreen and team re: Intel Fortran. My hunch is fpm users will benefit if the default with Intel Fortran compilers includes the -standard-semantics switch, as listed in the post by @certik.

1 Like

I have some problems with the above statement:

  • c_bool is not an integer kind, it is logical.
  • “ifort will also use C style …” - I understand this so that logical(c_bool) will always be interoperable with C _Bool and if the option is used, all logical kinds will use C style. This is not true with ifort, unfortunately.
    print *, transfer(.true._c_bool, 1_1) outputs -1 w/o -standard-semantics option

This is sort of strange. The compiler behavior w/o the option explicitly violates the standard requirement.

Yes, I misspoke - c_bool is a logical kind. However, ifort’s LOGICAL(c_bool) doesn’t interoperate with C unless you use -standard-semantics (or -fpscomp logicals)

If you don’t use -standard-semantics, then some defaults deviate from the standard - that’s the whole point of that option in that omitting it continues default behavior from the compiler’s history so as to not break programs. Over the years, some of the individual choices under this umbrella have become new defaults, such as reallocating the LHS on assignment.

-standard-semantics is great, if you assume the code should comply with Fortran 2018 semantics. It may be a problem for legacy applications. the full writeup is here, but the synopsis is:
Description

This option determines whether the current Fortran Standard behavior of the compiler is fully implemented.

If you specify option standard-semantics, it enables all of the options that implement the current Fortran Standard behavior of the compiler, which is Fortran 2018 features.

I think
-fp-model=precise (linux)
/fp-model:precise (windows)

is a good starting point. I am not sure about enabling F2018 semantics by default as it may cause issues for legacy application builds.

@greenrongreen , thanks but please note this is in the context of fpm and the fpm users are far more likely to seek standard semantics “out of the box” than anything legacy. A question: at what point does Intel begin to realize “legacy application” is not the present and definitely not the future?!!

But c_bool is sort of well-defined entity. IMHO if Intel chooses to not interoperate with C w/o giving that option, the c_bool should read -1 in this environment. And it does not.