Fortran error handling including stacktrace generation

Ever since I started learning Fortran many years back I’ve experimented with different ways of handling error scenarios in Fortran code. I’d like to share my latest iteration with you which is a method that I’ve found very useful and flexible. It is based on similar ideas to what you’ll see in many Rust error handling libraries like Eyre.

The code is put in a library which is easy to use both with fpm and CMake: GitHub - SINTEF/fortran-error-handling: A library for flexible and easy to use error handling in Fortran projects

The library fulfills what I believe is four essential requirements for any error handling method in Fortran:

  1. It should be possible to report errors from both pure and impure procedures (pure functions will always be a bit of a hassle until we have proper generics though).
  2. Errors should be difficult to overlook. It should be obvious for the developer that they need to check if something went wrong after calling a procedure.
  3. It should be easy to provide accurate information about what failed and when it occurred.
  4. It should be possible to make certain error situations programmatically identifiable. That is, the caller of a procedure should be able to programmatically detect and act if a certain error occurred.

Usage

General (non-identifiable) errors are easily created with a message explaining what happened:

use error_handling, only: error_t, fail

pure subroutine check_and_accumulate(i, res, error)
    integer, intent(in) :: i
    integer, intent(inout) :: res
    class(error_t), allocatable, intent(inout) :: error

    if (res > 50) then
        error = fail('Magic limit reached')
        return
    end if
    res = res + i
end subroutine

The wrap_error subroutine makes it easy to add contextual information from other layers of the code (complete example here):

use error_handling, only: error_t, wrap_error

pure subroutine process_array(arr, res, error)
    integer, intent(inout) :: arr(:)
    integer, intent(out) :: res
    class(error_t), allocatable, intent(inout) :: error

    integer :: i
    character(len=20) :: i_value

    ! Here we use a labelled block to separate multiple fallible procedure calls
    ! from the code that handles any errors
    res = 0
    fallible: block
        do i = 1, size(arr)
            call accumulate_and_check(arr(i), res, error)
            if (allocated(error)) exit fallible
        end do
        ! Return for subroutine on success, code below is only for
        ! error handling so no allocated(error) check is needed there.
        return
    end block fallible
    ! Provide some context with error
    write(i_value, *) i
    call wrap_error(error, 'Processing of array failed at element '    &
        // trim(adjustl(i_value)))
end subroutine

The result is a quite human readable error message:

Error: Example failed (but that was the intent...)

Caused by:
  - Processing of array failed at element 9
  - Magic limit reached

It is even extensible so that if you have a C++ compiler at hand you can use my other library to generate stacktraces for errors (code snippets are only available on the machine with the source code):

Error: x is negative

Stack trace (most recent call last):
#6    Object "", at 0xffffffffffffffff, in
#5    Object "build/linux-intel/debug/example/error-handling-integration", at 0x408ded, in _start
#4    Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f92b3c7e082, in __libc_start_main
#3    Object "build/linux-intel/debug/example/error-handling-integration", at 0x408ee1, in main
#2    Source "example/error-handling-integration.f90", line 76, in MAIN__
         73: program error_handling_integration
         74:     use error_handling_integration_example, only: run
         75:
      >  76:     call run
         77: end program
         78:
#1    Source "example/error-handling-integration.f90", line 60, in error_handling_integration_example::run
         57:             write(*,*) ' - sqrt = ', x
         58:             write(*,*) 'computing square root...'
         59:             x = - 20.0
      >  60:             call sqrt_inplace(x, error)
         61:             if (allocated(error)) exit fallible
         62:             write(*,*) ' - sqrt = ', x
         63:             ! Return from subroutine on success, code below is only for
#0    Source "example/error-handling-integration.f90", line 16, in sqrt_inplace_mod::sqrt_inplace
         13:         class(error_t), allocatable, intent(inout) :: error
         14:
         15:         if (x <= 0.0) then
      >  16:             error = fail('x is negative')
         17:             return
         18:         end if
         19:         x = sqrt(x)

Details

At it’s core, the library uses a very simple definition of what an error is:

type, abstract :: error_t
contains
    procedure(to_chars), deferred :: to_chars
end type

abstract interface
    !> Returns a human readable description of the error
    pure function to_chars(this) result(chars)
        import error_t
        class(error_t), intent(in) :: this
        character(len=:), allocatable :: chars
    end function
end interface

The base error type can be extended in order to make specific failures identifiable:

use error_handling, only: error_t

type, extends(error_t) :: no_such_folder_error_t
contains
    procedure :: to_chars => no_such_folder_to_chars
end type


type, extends(error_t) :: permission_denied_error_t
contains
    procedure :: to_chars => permission_denied_to_chars
end type

!...

And then use select type to check for the returned type:

use error_handling, only: error_t

class(error_t), allocatable :: error
call create_file('path/to/file.txt', error)
if (allocated(error)) then
    select type(error)
        type is (no_such_folder_error_t)
            ! Try to create directory and then create the file again
            ! ...
    end select
end if

Further Work

An abstract error_t would be much more useful if libraries based their own error types off it. I think the Fortran community would benefit from having a small common library which defined types like this, say github.com/fortran-lang/std-types or something similar (that link does of course not exist…). When everyone derive from the same error type one could for example write code like this without having to think about type conversions for each library’s error type:

subroutine read_input(filename, error)
    character(len=*), intent(in) :: filename
    class(error_t), intent(out) :: error

    if (ends_width('.json', filename)) then
        ! Imaginary call to json-fortran, jsonff or other json library
        block
            type(json_value_t) :: json

            ! This subroutine could create a json_error_t for ´error´, but since
            ! it extends error_t we can seamlessly pass it upwards
            call read_json(filename, json, error)
            if (allocated(error)) return
            ! Continue parsing...
        end block
    else if (ends_with('.toml', filename)) then
        ! Imaginary call to toml-f
        block
            type(toml_value_t) :: toml

            ! This subroutine could create a toml_error_t for ´error´, but since
            ! it extends error_t we can seamlessly pass it upwards
            call read_toml(filename, toml, error)
            if (allocated(error)) return
            ! Continue parsing...
        end block
    else
        error = fail('Unsupported file suffix')
    end if
end subroutine
14 Likes

Thanks for sharing your work. I remember there were a few interesting discussions on error handling before (threads 1114 and 3050 are just two).

There are a few more error handling libraries that define their own derived types, including (in no particular order)

Can you comment on how does your library differ or what does it do better? What other factors hinder the adoption of a common error handling module (and derived type)?

1 Like

@ivanpribec proper stacktrace generation is probably the most prominent feature. I haven’t seen anyone else do that in fortran.

Pure procedures is also often overlooked by other solutions.

The fact that the error type is abstract makes it very flexible. This is similar to Exception in Java, Python etc, but for some reason nobody seems to have adopted it in Fortran. If the Fortran standard ever defines an error type (or, as mentioned, Fortran-lang) it would be easy to adapt this library to it.

Some of the examples you list provide modern solutions to error handling, but still for some reason sneak in integer error codes. There should be no need for error codes when you can express the same more robustly with the type system so I believe that is a better solution.

1 Like

Nice!

How do you run the examples with fpm? fpm run basic fails. I did mv example app, then it worked:

$ mv example/ app
$ fpm run basic  
 + mkdir -p build/dependencies
error.f90                              done.
error_handling.f90                     done.
custom-error-type.f90                  done.
adding-context.f90                     done.
basic.f90                              done.
result.f90                             done.
error_handling_impl.f90                done.
error_stop_impl.f90                    done.
liberror-handling.a                    done.
custom-error-type                      done.
adding-context                         done.
basic                                  done.
[100%] Project compiled successfully.
 computing square root...
  - sqrt =    4.47213602    
 computing square root...
Error: x is negative

But it doesn’t print the stacktrace. How do you get it to do that? I couldn’t find simple instructions in the README.

I like this. Maybe it is already there, just starting to look but in my error processing procedures I like to easily incorporate intrinsic numbers into many of my messages using something like this, which I changed to be pure …

M_msg
 module M_msg
use,intrinsic :: iso_fortran_env, only : int8, int16, int32, int64, real32, real64, real128
implicit none
private
public msg
contains
! str(3f) - [M_msg] converts up to nine standard scalar type values to a string
pure function msg(g0, g1, g2, g3, g4, g5, g6, g7, g8, g9)
implicit none
class(*),intent(in),optional  :: g0, g1, g2, g3, g4, g5, g6, g7, g8, g9
character(len=:),allocatable  :: msg
character(len=4096)           :: line
integer                       :: istart
   istart=1
   line=''
   if(present(g0))call print_g(g0,line,istart)
   if(present(g1))call print_g(g1,line,istart)
   if(present(g2))call print_g(g2,line,istart)
   if(present(g3))call print_g(g3,line,istart)
   if(present(g4))call print_g(g4,line,istart)
   if(present(g5))call print_g(g5,line,istart)
   if(present(g6))call print_g(g6,line,istart)
   if(present(g7))call print_g(g7,line,istart)
   if(present(g8))call print_g(g8,line,istart)
   if(present(g9))call print_g(g9,line,istart)
   msg=trim(line)
contains
pure subroutine print_g(g,line,istart)
use,intrinsic :: iso_fortran_env, only : int8, int16, int32, int64, real32, real64, real128
class(*),intent(in) :: g
character(len=4096),intent(inout) :: line
integer,intent(inout) :: istart
integer,parameter :: increment=2
character(len=*),parameter :: sep=' '
   select type(g)
      type is (integer(kind=int8));     write(line(istart:),'(i0)') g
      type is (integer(kind=int16));    write(line(istart:),'(i0)') g
      type is (integer(kind=int32));    write(line(istart:),'(i0)') g
      type is (integer(kind=int64));    write(line(istart:),'(i0)') g
      type is (real(kind=real32));      write(line(istart:),'(1pg0)') g
      type is (real(kind=real64));      write(line(istart:),'(1pg0)') g
      !type is (real(kind=real128));     write(line(istart:),'(1pg0)') g
      type is (logical);                write(line(istart:),'(l1)') g
      type is (character(len=*));       write(line(istart:),'(a)') trim(g)
      type is (complex);                write(line(istart:),'("(",1pg0,",",1pg0,")")') g
   end select
   istart=len_trim(line)+increment
   line=trim(line)//sep
end subroutine print_g
end function msg
end module M_msg
program testit
use M_msg, only : msg
character(len=:),allocatable :: text
text=msg('it is',.true.,'I like to add some numbers like',10,'and',30.4,'and',(30.0,40.0),'to the message')
print *, text
end program testit

Would that be possible to incorporate? Are there major cons to doing so? (sorry if I missed it and it does that already!)

Thanks, @certik! Maybe you just forgot the --example flag to fpm? fpm run --example basic works for me at least.

For stacktraces you need this extra library. It wraps some C++ code so unfortunately it’s not available with fpm yet. In theory fpm 0.7.0 should be capable of building C++ sources, but I get errors due to including C++ standard library headers. Will submit a bug report when I’ve figured out what’s going on.

With the stacktrace library added as a dependency in CMake however, you just add one line near the beginning of your Fortran application to generate stacktraces from the fail and wrap_error procedures:

program failing
    use error_handling, only: error_t, fail, set_error_hook
    use stacktrace_mod, only: stacktrace_error_hook_t
    implicit none

    class(error_t), allocatable :: error

    ! Do this once near the start of your application to generate
    ! a stacktrace from fail and wrap_error.
    call set_error_hook(stacktrace_error_hook_t())
   
    error = fail('This just fails')
    write(*, '(a,a)') error%to_chars()
end program
3 Likes

@urbanjost you mean including numbers in the error message you’re going to show to the user? Yes, I think that’s pretty essential!

The “standard” Fortran way would be to generate the message this way:

integer :: i
character(len=20) :: i_value
character(len=:), allocatable :: error_message

i = 123
write(i_value, *) i
error_message = 'Processing failed for i = ' // trim(adjustl(i_value))

I do however find that way verbose and clumsy when the number of numbers to be included grows :frowning: Using a function like then one from your M_msg is a lot better. I’ve made similar functions myself as well (though no library atm) and if I remember correctly then stdlib has a to_string function that does something similar as well.

1 Like

Ah, that’s it! It works now. Can you please document it in the README?

Hi Ivan,

the libraries I referred to in my post on “Modernising old libraries” do not use a particularly fancy error handling mechanism. As they are to be part of a larger program, I want to make clear that some error has occurred by returning a NaN - the functions in these libraries return a (numerical) value, so that seems appropriate. The orignal ones stopped the program altogether, with an explanation on the screen but I do not think that is useful behaviour for a supporting library.

Regards,

Arjen