What are the most common run-time errors in your programs? For me they are
Using allocate for a variable that is already allocated.
Using an optional variable that is not present.
Mismatched format string and output. (I should get in the habit of using the G edit descriptor.)
Out-of-bounds array access.
I want to see if a tool can catch most common errors through static analysis. For (1) you could add if (allocated(x)) deallocate (x) before allocate(x(…) unless the static analyzer can prove that x has not been allocated. For (2) you could similarly add an if (present(x)) guard. For (3), when the variables to be printed are scalars or arrays with fixed sizes, a tool could check for mismatched formats.
In addition to what you listed already, here is what I usually encounter:
using unassociated pointers
size mismatched, espacially with string passed to procedures whose dummy argument has an explicite length (that may not cause run-time errors dependending on what garbage is in memory, but results can be surprising)
using size with unallocated array (ifort returns 0, gfortran returns 1). The safe version would be merge(size(array), 0, allocated(array)).
When forgetting that Fortran has no short-circuiting logic (corrected example) :
if (s <= len(string) .and. string(s:s) /= ‘!’) then
s = s + 1
end if
since both sides of the .and. will be evaluated, this will eventually cause an out-o-bound array access). That applies to if, while and merge constructs.
It seems there’s an extra ) in there, but that’s exactly how I envision short-circuit being introduced in the language eventually (Maybe in Fortran 2038 Epochalypse Edition?), that is:
if (s <= len(string)) .and. (string(s:s) /= ‘!’) then
...
endif
if (x < 0) .or. (use_default) then
...
endif
Even though borrowing syntax from bash seems frowned upon, that avoids introducing the awkward-looking, and also possibly colliding, .andthen. and .orelse..
That is not safe either. You should assume that merge will evaluate both tsource and fsource. You need to use an if block. Because I have had the same problem with size I proposed making the size of an unallocated array -1. Peter Klausler suggested instead defining a safe wrapper for size
module mySizeModule
contains
pure integer function mySize(x)
class(*), dimension(..), intent(in), optional :: x
mySize = -1
if (present(x)) mySize = size(x)
end function
end module
use mySizeModule
integer, allocatable :: a(:), b(
allocate(a(10))
print *, mySize(a), mySize(b)
end
I must not understand the OP, because the proposed solutions to 1 and 2 are both runtime checks, and since formats may be constructed at runtime, detecting format mismatch must also be a runtime check as well. But I approve of runtime checks. Maybe they can be disabled in a well understood inner loop, but in my work they are a trivial cost compared to debugging without them.
The most common error I see that is not caught by gfortran is an uninitialized variable. Out-of-bounds array references are frequent, but caught by gfortran.
Thanks for the tip.
Not sure I understand why it’s not safe, though. In that case, both sides can be evaluated, without any problem. But I would get the expected 0.
I totally agree with you if you try to use merge with optional parameters, like merge(1.0/x, 0.0, present(x)). BTW, this could also be part of your checklist.
You can construct a format string at run time, but in my codes and in most codes I see, format strings are usually literals or named constants known at compile time (or there are format statements also known at compile time). I have a Python script to find format mismatches that for
implicit none
integer, parameter :: n = 3
real :: x = 2.1
integer :: m, i = 3, a(0), j(n) = 0
integer, allocatable :: k(:)
real :: y(n) = 3.1
allocate (k(2), source=0)
print "(*(i0),f0.4)", j, y
print "(2i0, f0.4)", j, x
print "(2i0, f0.4)", k, x
print "(2i0)", i, x
print "(i0, f0.4)", (j(m), y(m), m=1,n)
print "(*(f0.4))", a
end
that says for python xformat_mismatch.py xformat.f90 --verbose
3 format/type mismatch finding(s) in 1 file(s) (definite 3, possible 0).
xformat.f90:8 program main [definite] - format descriptor 'i' likely mismatches item 2 type real
xformat.f90:9 program main [definite] - format descriptor 'i' likely mismatches item 2 type real
xformat.f90:11 program main [definite] - format descriptor 'i' likely mismatches item 2 type real
The thread so far discusses standard Fortran features causing errors, but I’ll bring up something non-standard that I wish was standard. I have my own pure assertion subroutine using error stop, which is pretty simple to create and common. Assertion violations are one of the most common categories of run-time errors for me. Conditions that I think (hope) are impossible happen sometimes, whether due to one or more bugs in my code or the condition itself being wrong.
Most assertion violations can’t be found via static analysis. To find more assertion violations, people often propose fuzz testing, which I haven’t seen used on Fortran code yet.
I’ve also complained in the past about the danger of using size(x) on unallocated x (this undefined behavior rule makes no sense to me - size(x) should just return -1 on a non-allocated, allocatable argument, for the same reason that we can query the allocated status on it.
Anyways if your array descriptor contains garbage (points to random address because unallocated), size may end up trying to read an unaccessible memory region, that can trigger a segfault on some systems/compilers.
The promised conditional-expression/conditional-argument also fixes that.
Before, an unallocated or non-associated actual argument implied absence. Now (by which I mean, when widely implemented) you can use .nil. without the “has to be allocatable/pointer” restriction:
I proposed a few years ago to add an optional argument to ALLOCATE that tells the compiler if any of the arrays in an ALLOCATE list are already allocated, it is free to deallocate the array prior to reallocating to the new requested size. something like
ALLOCATE(a(newsize), DEALLOCATE=.TRUE.)
Since most compilers already throw an hard error (absent the STAT option) then logic says that checking for an existing allocation already exists. Implementing this should not be a major task. Since this is a new option to an existing intrinsic there are no backwards compatability problems. This would save the additional logic required to check allocation first.
I don’t usually have issues with any of your other problems since I make an extra effort (particularly the out of bounds access) during code development to make sure they don’t occur.
I also think this would be a good idea, but there are lots of details that must be considered. The reason this is important is that allocation is an expensive operation, particularly when done on the heap where it can trigger an expensive search or garbage collection operation, but even a local array allocated on the stack is expensive compared to, say, a few arithmetic operations.
Those details involve what happens when 1) an array is already allocated with the correct size and bounds, and 2) the array is allocated with the correct size but with different bounds. That second situation involves simply changing the upper and lower bounds with given extents of an existing array, and also changing the extents and the bounds together for multidimensional arrays. For example, changing an existing (1:3,1:4) array of size 12 to a new (0:5,-2:-1) array using the same underlying allocation could be a very fast metadata operation rather than a slower heap/stack operation. This kind of operation, changing bounds and/or extents of allocated arrays, is not currently supported within the language, so there is a question of whether this should be a separate operation entirely, or whether it should be done within the more general ALLOCATE statement. So there are many possibilities that need to be considered to do this the right way within the language.
If this is implemented in the standard, then all of the situations where an array is previously allocated need to be fully specified, in particular the situations where the array is a target of one or more pointers and whether those pointers retain their previous association status. We would want to avoid something like the current allocation-on-assignment situation where the pointer association status after an assignment statement is ambiguous or incomplete within the standard.
MAY be evaluated, and the order is unspecified. This can be a bigger problem if one of the operands is a function call - the program is not guaranteed to call the function if the processor can determine the value of the expression without doing so.
I think that works only for dummy arguments. In general, present() is not equivalent to allocated(). Its use in this way is limited to dummy arguments with the optional attribute but not the allocatable attribute that are associated with an unallocated actual argument. The standard considers this to be one of the cases where an optional dummy argument is not associated with an actual argument.