Weird behaviour with Cray 11 and Test-Drive 0.4.0. Something I'm doing wrong?

Just a quick note that the only Cray compiler I have access to is the one on ARCHER2 where we have to use their compiler wrapper ftn, that’s not something I can control (but might be the cause of the issue)…

I’ve only been able to reproduce this with Cray, so I’m leaning towards it being some kind of compiler bug, but at the same time I’m very wary that undefined behaviour can do weird things and you’re very at the mercy of whether a compiler exploits it or not.

I built test-drive itself as shown in ARCHER2-Cray-build_test_drive.txt (4.5 KB), and all building and running was done as

username@ln03:~/test> ftn -Itest-drive-inst/include/test-drive/Cray-11.0.4/ -Ltest-drive-inst/lib64/ -g -O0 -c test_weird.f90 -ltest-drive
username@ln03:~/test> ftn -Itest-drive-inst/include/test-drive/Cray-11.0.4/ -Ltest-drive-inst/lib64/ -g -O0 main.f90 test_weird.o -ltest-drive
username@ln03:~/test> valgrind ./a.out

using

username@ln03:~/test> ftn --version
Cray Fortran : Version 11.0.4
username@ln03:~/test> cc --version
Cray clang version 11.0.4  (bc9473a12d1f2f43cde01f962a11240263bd8908)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/cray/pe/cce/11.0.4/cce-clang/x86_64/share/../bin
username@ln03:~/test> valgrind --version
valgrind-3.16.0.RC2

Here is test_weird.f90 (3.8 KB) and main.f90 (882 Bytes).

The exact behaviour is very dependent on which tests are enabled. If the first test (that explicitly sets should_fail to .false. and is currently commented out) is uncommented and its equivalent without .false. commented out everything works as expected but valgrind reports issues in test-drive:

==240373== Memcheck, a memory error detector
==240373== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==240373== Using Valgrind-3.16.0.RC2 and LibVEX; rerun with -h for copyright info
==240373== Command: ./a.out
==240373==
# Testing: weird
  Starting int-to-char::zero ... (1/7)
==240373== Conditional jump or move depends on uninitialised value(s)
==240373==    at 0x4078F0: make_output$testdrive_ (testdrive.F90:438)
==240373==    by 0x4075D0: run_unittest$testdrive_ (testdrive.F90:396)
==240373==    by 0x40706A: run_testsuite$testdrive_ (testdrive.F90:336)
==240373==    by 0x401E22: main (main.f90:20)
==240373==
==240373== Conditional jump or move depends on uninitialised value(s)
==240373==    at 0x407D80: make_output$testdrive_ (testdrive.F90:444)
==240373==    by 0x4075D0: run_unittest$testdrive_ (testdrive.F90:396)
==240373==    by 0x40706A: run_testsuite$testdrive_ (testdrive.F90:336)
==240373==    by 0x401E22: main (main.f90:20)
==240373==
==240373== Conditional jump or move depends on uninitialised value(s)
==240373==    at 0x4082D3: make_output$testdrive_ (testdrive.F90:458)
==240373==    by 0x4075D0: run_unittest$testdrive_ (testdrive.F90:396)
==240373==    by 0x40706A: run_testsuite$testdrive_ (testdrive.F90:336)
==240373==    by 0x401E22: main (main.f90:20)
==240373==
       ... int-to-char::zero [PASSED]
  Starting int-to-char::one-digit ... (2/7)
       ... int-to-char::one-digit [PASSED]
  Starting int-to-char::one-digit-negative ... (3/7)
       ... int-to-char::one-digit-negative [PASSED]
  Starting int-to-char::two-digits ... (4/7)
       ... int-to-char::two-digits [PASSED]
  Starting int-to-char::two-digits-negative ... (5/7)
       ... int-to-char::two-digits-negative [PASSED]
  Starting int-to-char::ten-digits ... (6/7)
       ... int-to-char::ten-digits [PASSED]
  Starting int-to-char::ten-digits-negative ... (7/7)
       ... int-to-char::ten-digits-negative [PASSED]
==240373==
==240373== HEAP SUMMARY:
==240373==     in use at exit: 31,117 bytes in 5 blocks
==240373==   total heap usage: 110 allocs, 105 frees, 386,284 bytes allocated
==240373==
==240373== LEAK SUMMARY:
==240373==    definitely lost: 0 bytes in 0 blocks
==240373==    indirectly lost: 0 bytes in 0 blocks
==240373==      possibly lost: 0 bytes in 0 blocks
==240373==    still reachable: 31,117 bytes in 5 blocks
==240373==         suppressed: 0 bytes in 0 blocks
==240373== Rerun with --leak-check=full to see details of leaked memory
==240373==
==240373== Use --track-origins=yes to see where uninitialised values come from
==240373== For lists of detected and suppressed errors, rerun with: -s
==240373== ERROR SUMMARY: 21 errors from 3 contexts (suppressed: 0 from 0)

If only two tests are uncommented everything (ignoring valgrind) works fine as well:

==111181== Memcheck, a memory error detector
==111181== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==111181== Using Valgrind-3.16.0.RC2 and LibVEX; rerun with -h for copyright info
==111181== Command: ./a.out
==111181==
# Testing: weird
  Starting int-to-char::ten-digits ... (1/2)
==111181== Conditional jump or move depends on uninitialised value(s)
==111181==    at 0x406760: make_output$testdrive_ (testdrive.F90:438)
==111181==    by 0x406440: run_unittest$testdrive_ (testdrive.F90:396)
==111181==    by 0x405EDA: run_testsuite$testdrive_ (testdrive.F90:336)
==111181==    by 0x401E22: main (main.f90:20)
==111181==
==111181== Conditional jump or move depends on uninitialised value(s)
==111181==    at 0x406BF0: make_output$testdrive_ (testdrive.F90:444)
==111181==    by 0x406440: run_unittest$testdrive_ (testdrive.F90:396)
==111181==    by 0x405EDA: run_testsuite$testdrive_ (testdrive.F90:336)
==111181==    by 0x401E22: main (main.f90:20)
==111181==
==111181== Conditional jump or move depends on uninitialised value(s)
==111181==    at 0x407143: make_output$testdrive_ (testdrive.F90:458)
==111181==    by 0x406440: run_unittest$testdrive_ (testdrive.F90:396)
==111181==    by 0x405EDA: run_testsuite$testdrive_ (testdrive.F90:336)
==111181==    by 0x401E22: main (main.f90:20)
==111181==
       ... int-to-char::ten-digits [PASSED]
  Starting int-to-char::ten-digits-negative ... (2/2)
       ... int-to-char::ten-digits-negative [PASSED]
==111181==
==111181== HEAP SUMMARY:
==111181==     in use at exit: 31,117 bytes in 5 blocks
==111181==   total heap usage: 60 allocs, 55 frees, 382,849 bytes allocated
==111181==
==111181== LEAK SUMMARY:
==111181==    definitely lost: 0 bytes in 0 blocks
==111181==    indirectly lost: 0 bytes in 0 blocks
==111181==      possibly lost: 0 bytes in 0 blocks
==111181==    still reachable: 31,117 bytes in 5 blocks
==111181==         suppressed: 0 bytes in 0 blocks
==111181== Rerun with --leak-check=full to see details of leaked memory
==111181==
==111181== Use --track-origins=yes to see where uninitialised values come from
==111181== For lists of detected and suppressed errors, rerun with: -s
==111181== ERROR SUMMARY: 6 errors from 3 contexts (suppressed: 0 from 0)

With exactly 5 tests (ARCHER2-Cray-no_explicit-5_tests.txt (32.1 KB)) there are no segfaults, but all the tests fail. Again though re-enabling the test that sets should_fail fixes things (except valgrind issues):

==135853== Memcheck, a memory error detector
==135853== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==135853== Using Valgrind-3.16.0.RC2 and LibVEX; rerun with -h for copyright info
==135853== Command: ./a.out
==135853==
# Testing: weird
  Starting int-to-char::zero ... (1/6)
==135853== Conditional jump or move depends on uninitialised value(s)
==135853==    at 0x407550: make_output$testdrive_ (testdrive.F90:438)
==135853==    by 0x407230: run_unittest$testdrive_ (testdrive.F90:396)
==135853==    by 0x406CCA: run_testsuite$testdrive_ (testdrive.F90:336)
==135853==    by 0x401E22: main (main.f90:20)
==135853==
==135853== Conditional jump or move depends on uninitialised value(s)
==135853==    at 0x4079E0: make_output$testdrive_ (testdrive.F90:444)
==135853==    by 0x407230: run_unittest$testdrive_ (testdrive.F90:396)
==135853==    by 0x406CCA: run_testsuite$testdrive_ (testdrive.F90:336)
==135853==    by 0x401E22: main (main.f90:20)
==135853==
==135853== Conditional jump or move depends on uninitialised value(s)
==135853==    at 0x407F33: make_output$testdrive_ (testdrive.F90:458)
==135853==    by 0x407230: run_unittest$testdrive_ (testdrive.F90:396)
==135853==    by 0x406CCA: run_testsuite$testdrive_ (testdrive.F90:336)
==135853==    by 0x401E22: main (main.f90:20)
==135853==
       ... int-to-char::zero [PASSED]
  Starting int-to-char::one-digit-negative ... (2/6)
       ... int-to-char::one-digit-negative [PASSED]
  Starting int-to-char::two-digits ... (3/6)
       ... int-to-char::two-digits [PASSED]
  Starting int-to-char::two-digits-negative ... (4/6)
       ... int-to-char::two-digits-negative [PASSED]
  Starting int-to-char::ten-digits ... (5/6)
       ... int-to-char::ten-digits [PASSED]
  Starting int-to-char::ten-digits-negative ... (6/6)
       ... int-to-char::ten-digits-negative [PASSED]
==135853==
==135853== HEAP SUMMARY:
==135853==     in use at exit: 31,117 bytes in 5 blocks
==135853==   total heap usage: 100 allocs, 95 frees, 385,617 bytes allocated
==135853==
==135853== LEAK SUMMARY:
==135853==    definitely lost: 0 bytes in 0 blocks
==135853==    indirectly lost: 0 bytes in 0 blocks
==135853==      possibly lost: 0 bytes in 0 blocks
==135853==    still reachable: 31,117 bytes in 5 blocks
==135853==         suppressed: 0 bytes in 0 blocks
==135853== Rerun with --leak-check=full to see details of leaked memory
==135853==
==135853== Use --track-origins=yes to see where uninitialised values come from
==135853== For lists of detected and suppressed errors, rerun with: -s
==135853== ERROR SUMMARY: 18 errors from 3 contexts (suppressed: 0 from 0)

Just 3 tests spews mojibake and parts of my environment variables to the terminal (ARCHER2-Cray-no_explicit-3_tests.txt (94.7 KB)), which makes me think Cray is doing something weird with C-style null termination. But if that were the case then I would expect allocating an extra character to fix things but it doesn’t.

Anything more than 5 tests segfaults (ARCHER2-Cray-no_explicit-6_tests.txt (56.3 KB)) as does including both of the zero tests (ARCHER2-Cray-no_explicit-both_zero_tests.txt (3.6 KB)).

The valgrind errors point to this line, this line, and this line, which makes me think that this is to do with the way Cray compiles optional dummy arguments, but I’m not sure how to refactor test-drive to avoid that so I can’t check it.

Ugh, of course the answer hits me like 5 minutes after I’ve posted the question!

make_output is only called here where error is always present, but may or may not be allocated. I guess there’s something in the way that other compilers implement optional dummy arguments that means that this just happens to work, but Cray seems to expect to be able to dereference the argument to check whether it’s present which probably invokes UB if error isn’t allocated.

Seems to be a bug in test-drive then.

It’s probably because writing the problem here helped your brain to work more efficiently on it! :brain:

1 Like

This is actually a feature of the language. An unallocated actual argument to an optional dummy is considered not present. If this isn’t working properly you may want to submit a bug report to cray. If you do, try and create a simple, standalone program as small as possible to demonstrate the issue.

@seamsay,

I was looking at your code, including FUNCTION int_to_char(n)
I do not understand why you are using “width = FLOOR(LOG10(ABS(REAL(n)))) + 1”

The following approach using len_trim appears more direct.
Are there “edge” conditions that require the more complex formula approach ?
(The use of ss*12 is valid as “INT32” has been selected and all the hard work has been done by ‘(I0)’.)

   FUNCTION int_to_char(n) RESULT (s)
        INTEGER(INT32), INTENT(IN) :: n
        CHARACTER(len=:), ALLOCATABLE :: s

        CHARACTER(len=12) :: ss
        INTEGER :: width

        WRITE (ss, '(I0)') n
        width = len_trim(ss)

        ALLOCATE (CHARACTER(len=width) :: s)
        s = ss

   END FUNCTION int_to_char

“s = trim(ss)” is too modern for my tastes !

I don’t think that pair of statements does what you think they do.

If that is what you want, then you should use that assignment instead. Another possible expression is

s = ss(1:len_trim(ss))

That is still pretty clear to a human reader, and it avoids any redundant temporary allocation on the right hand side of the assignment.

program talloc
   character(:), allocatable :: s
   allocate(character(len=1) :: s)
   s = 'Hello World'  ! allocation on assignment.
   write(*,*) len(s), s
end program talloc

I agree with the warning about FLOOR(LOG10(ABS(REAL(n)))). Besides being an inefficient way of counting digits, if the LOG10() evaluation is off even in the last bits, then FLOOR() can be one character off. Consider FLOOR(2.0001)+1 for n=99 or FLOOR(1.9999)+1 for n=100.

X86/X64 processors since Haswell have the LZCNT instruction, which puts the count of leading zeros in one R/M operand into the destination register. Some Fortran compilers provide a built-in function that allows this instruction to be used in Fortran code. Using this instruction, the expression LOG10(ABS(REAL(n))) could be replaced by (32-LZCNT(iabs(n))) * 0.301 if n is a 32-bit integer, and with obvious changes for n being a 64-bit integer.

The fortran intrinsic function for leading zero count is LEADZ(). The number of bits in an integer is given by LEADZ(0_ikind), STORAGE_SIZE(), and BIT_SIZE(). One wonders why there are three functions that give the same value? In any case, expressions like STORAGE_SIZE(n)-LEADZ(n) can be used to compute the number of bits required to hold n, which in exact arithmetic is related to LOG2(n). One would of course hope that LEADZ() is implemented efficiently on the underlying hardware, and that STORAGE_SIZE(n) (or its equivalent) would be evaluated at compile time.

However, as @JohnCampbell already noted, the heavy lifting in this case has already been done by the ‘(I0)’ format, so none of that is necessary in this case, numerical accuracy issues of the logarithm aside.

Here is some code that shows that all three functions return the same values.

program bits
   use, intrinsic :: iso_fortran_env, only: int8, int16, int32, int64
   character(*), parameter :: cfmt='(a,3i3)'
   write(*,cfmt) 'int8: ', leadz(0_int8), bit_size(0_int8), storage_size(0_int8)
   write(*,cfmt) 'int16:', leadz(0_int16), bit_size(0_int16), storage_size(0_int16)
   write(*,cfmt) 'int32:', leadz(0_int32), bit_size(0_int32), storage_size(0_int32)
   write(*,cfmt) 'int64:', leadz(0_int64), bit_size(0_int64), storage_size(0_int64)
end program bits

$ gfortran bits.F90 && a.out
int8:   8  8  8
int16: 16 16 16
int32: 32 32 32
int64: 64 64 64

Even after reading your reply, I still do not know the difference, or when a programmer should use one rather than another.

I think I’ve cut it down to as small a MWE as I can (potentially there’s more that I could cut out but my brain is frazzled): unallocated-optional-bug.f90 (2.5 KB)

Would somebody mind just giving it a quick once over, making sure that it is actually standards compliant please?

No particularly reason, that was just the first solution that came to mind. I agree your solution is much cleaner though.

@seamsay,

For whatever it’s worth, I reckon your code ls conformant with current standard.

What is the program response with this MWE on the (Cray 11?) processor in question?

I believe what you’ve got is conformant. I will say though, it may have been possible to demonstrate the problem with something as simple as

integer, allocatable :: i
call foo(i)
contains
subroutine foo(j)
  integer, intent(in), optional :: j
  if (present(j)) then
    print *, "How?"
  else
    print *, "Good"
  end if
end subroutine
end

@seamsay ,

Can you conform the processor you’re using (Cray 11) supports Fortran 2008?

You may know it was with the Fortran 2008 revision published late in 2010 (circa Nov) the standard started supporting the facility of interest to you, “A null pointer or unallocated allocatable can be used to denote an absent nonallocatable nonpointer optional argument.”

Now Cray Fortran was supposedly the first off the block claiming full Fortran 2008 support. Also, most other compilers introduced support for this specific extension quite early on. If I’m not mistaken, both gfortran and Intel Fortran have supported this feature at least since the 2011-2012 timeframe. So it will be most surprising if Cray compilers generally don’t support this, or if they have any bugs with it.

Nonetheless, few of the Fortran practitioners who provide community service on online Fortran forums tend to have access to supercomputers enough to keep track of details, especially compilers versions and any outstanding bugs with particular features of 2008 (and 2018) revisions.

Thus you may want to preview the feature support list with the compiler on your processor ARCHER2. Also, if the concern is also with additional arguments with other attributes, perhaps you will consider the following to add to the MWE from @everythingfunctional.

  type t
  end type
  character(len=:), allocatable :: present_x 
  type(t), allocatable :: x
  call sub( present_x, x )
  print *, present_x
contains
  subroutine sub( c, a )
     character(len=:), allocatable, intent(out) :: c 
     type(t), intent(in), optional :: a
     if ( present(a) ) then ; c = "T" ; else ; c = "F" ; end if 
  end subroutine
end 

Program response with another processor:

C:\temp>gfortran -ffree-form p.f -o p.exe

C:\temp>p.exe
 F

Ok, so I think I see now that STORAGE_SIZE(n) need not be the same as BIT_SIZE(n) when there is padding or other extra bits that are not part of the model unsigned integer value. I was only thinking of things like big- and little-endian addressing and ones- and twos-complement or signed-magnitude integer representations.

However, I do not think that LEADZ(0) is a special case. It appears to do exactly what one would expect it to do. 16.9.111 says “5. If all of the bits of I are zero, the result has the value BIT_SIZE (I).” So that implies that LEADZ(n) might differ from STORAGE_SIZE(n) when there is padding of some kind. I don’t think I would want LEADZ() to do anything differently than it does.

It is still not clear what numeric_storage_unit should be? It seems like it should be the same as STORAGE_SIZE(n) when n is a default integer, which might differ from LEADZ() and BIT_SIZE(). Is that correct?

It is a little unsettling that there is so much that is processor dependent about the bit operators. The output I showed before is what I would consider the typical case. Namely, all the bits within the storage unit are used, and no other hidden bits are accessed for the sign of the numerical value. But, as you say, there is also the possibility that only 31 bits are used to define the unsigned model integers, or that 32 bits are used in the model integers with another bit from somewhere used to determine the sign of the numerical model. That seems like it could be a big deal when it comes to writing portable code. Of course, there are also practical issues such as big- and little-endian addressing that comes into play that are outside the scope of the two types of model integers defined in the standard.

On Perlmutter we have Cray 15 installed. I.e.

[login35:~/reproducers/nonpresent_allocatable] ml PrgEnv-cray
[login35:~/reproducers/nonpresent_allocatable] ftn --version
No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used.
Load a valid targeting module or set CRAY_CPU_TARGET
Cray Fortran : Version 15.0.1

trying my example I get:

[login35:~/reproducers/nonpresent_allocatable] ftn example.f90 -o example     
No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used.
Load a valid targeting module or set CRAY_CPU_TARGET
[login35:~/reproducers/nonpresent_allocatable] ./example 
 Good

and with your, more complicated example I get

[login35:~/reproducers/nonpresent_allocatable] ftn complicated.f90 -o complicated     
No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used.
Load a valid targeting module or set CRAY_CPU_TARGET
[login35:~/reproducers/nonpresent_allocatable] ./complicated 
incorrect
correct

which suggests the bug is still there, but that it’s not so simple to reproduce. Following up with @FortranFan’s example produces

[login35:~/reproducers/nonpresent_allocatable] ftn medium.f90 -o medium
No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used.
Load a valid targeting module or set CRAY_CPU_TARGET
[login35:~/reproducers/nonpresent_allocatable] ./medium 
 F

so still not enough. I will submit a bug report to HPE, but it doesn’t hurt for you to submit one as well.

WRITE (ss, ‘(I0)’) n can be very expensive regarding performance, so if it is used, you should maximise the information.

LEADZ is very useful for quickly approximating LOG_2 (n), which is a way to count the number of characters in an integer, although there are edge conditions where this can fail. (hence +1?)

Considering BIT_SIZE or STORAGE_SIZE, there is another option to use for determining the length of a variable, or importantly group of variables, which is “INQUIRE ( iolength=length) olist”. ( unfortunately there is no BYTE_SIZE intrinsic)
This appears to provide a very flexible approach for determining the “byte” size of the olist.
The length returned can be in “file storage units”, ie bytes !!.

The big advantage of this approach is that “olist” might be more generally interpreted, possibly including derived types, unlike:
“LEADZ” that is limited to integers,
“BIT_SIZE” that is limited to integers,
“SIZE” that is limited to an array or array section (excluding derived types),
“STORAGE_SIZE” that is limited to any type or rank
They have lots of similarity but are very restricted.

Sorry it’s been a few months, I’ve not had the bandwidth for this recently.

Can you conform the processor you’re using (Cray 11) supports Fortran 2008?

Yes, Cray claims to support Fortran 2018 since at least version 11. I’ve just checked again with version 15 and the same issue.

Thanks for that! I ran out of steam trying to minimise my working example and then lost the bandwidth. Now that I’ve got a bit more time I’ll have another go and see if I can submit a proper MWE.