Getting a backtrace

vsnyder · November 24, 2021, 10:25pm

When a code has painted itself into a corner, or discovered an unexpected catastrophic error, it is useful to the developer if the code provides a back trace. Although it’s probably a source of confusion for the user, at least the user can send it to the developer. One can get a back trace in a debugger such as gdb, but not after the code has stopped . Otherwise, one must know where the problem was detected or reported, to plant a break in the debugger. Most of us have some kind of CRASH_BURN routine that causes a floating-point exception or rewinds an invalid unit or… . Univac FORTRAN V used “RETURN 0” to cause a back trace.

It would be helpful if there were a standard method for the code to ask for a back trace. The obvious place would be on the ERROR STOP statement. If there’s reluctance to provide a standard method, at least add a note that processors ought to provide a compiler option to cause a backtrace when an ERROR STOP statement is executed.

zaikunzhang · November 25, 2021, 2:11am

Agree. I have been using the backtrace under gfortran and tracebackqq under ifort, but they are not standard-conforming, and not every compiler provides similar intrinsic.

plevold · November 25, 2021, 10:42am

Totally agree with you that this is a major pain. I actually have some code for this that I think should work very well for you. I’d like to make it openly available, but I just need to get the right permissions at work first… Stay tuned!

I think it would be a mistake to tie this together with error stop. As @zaikunzhang said both Intel and gfortran has some language extensions, but if I remember correctly both also terminates execution.

The problem with terminating execution at the same time as you generate the stacktrace is that you lose the ability to gracefully unwind the call stack and provide contextual information along the way. Here’s an example:

module other_mod
    implicit none
contains
    integer function other_thing(i) result(k)
        integer, intent(in) :: i

        if (i > 25) then
            write(*,*) 'Error: i is too large: ', i
            error stop
        end if
        k = i + 2
    end function
end module

module some_mod
    use other_mod, only: other_thing
    implicit none
contains
    subroutine do_something(i)
        integer, intent(inout) :: i
        integer :: j

        do j = 1, 5
            i = other_thing(i + j)
        end do
    end subroutine
end module

program main
    use some_mod, only: do_something
    implicit none
    integer :: i

    i = 10
    call do_something(i)
    write(*,*)  'Got back: ', i
end program

A stacktrace from the location of error stop wouldn’t be enough to determine the cause of this problem. The call stack down to the failing other_thing function would look exactly the same for a successful invocation and a failing one.

At which time other_thing will fail depends on the value of j in do_something and i in program main so we should be able to unwind back to the program unit and provide this contextual information on the way back.

certik · November 25, 2021, 4:06pm

I also agree that the stacktrace that compilers currently give you should be improved a lot. As an example, here is how a stacktrace looks like with gfortran -g -fbacktrace for “error stop”:

ERROR STOP 

Error termination. Backtrace:
#0  0x104f4f6c3 in ???
#1  0x104f505df in ???
#2  0x104f51afb in ???
#3  0x104f2be6b in expr2
	at examples/expr2.f90:8
#4  0x104f2be9b in main
	at examples/expr2.f90:10

And here is how LFortran currently prints stacktraces of itself (i.e., the C++ code), in fact it remembers a stacktrace for every error or warning, so that you can see where in the LFortran compiler the error was generated together with all the calls that lead to it.

Here is how an error looks like:

And if you add the --show-stacktrace option, you get a stacktrace from LFortran itself for this particular error:

This is on macOS. You can see sometimes the line information is one line off (it should point to BinOp not BoolOp), I think it has to do with how the debugging information is stored in the executable. Here is how the full stacktrace looks like:

Full Stacktrace

$ lfortran examples/expr2.f90 --show-stacktrace
Traceback (most recent call last):
  File "/Users/certik/repos/lfortran/lfortran/src/bin/lfortran.cpp", line 1488
    err = compile_to_object_file(arg_file, tmp_o, false,
  File "/Users/certik/repos/lfortran/lfortran/src/bin/lfortran.cpp", line 710
    result = fe.get_asr2(input, lm, diagnostics);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/codegen/fortran_evaluator.cpp", line 218
    Result<ASR::TranslationUnit_t*> res2 = get_asr3(*ast, diagnostics);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/codegen/fortran_evaluator.cpp", line 240
    compiler_options.symtab_only);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/semantics/ast_to_asr.cpp", line 46
    auto res = body_visitor(al, ast, diagnostics, unit);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/semantics/ast_body_visitor.cpp", line 1209
    BodyVisitor b(al, unit, diagnostics);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/semantics/ast_body_visitor.cpp", line 75
    visit_ast(*x.m_items[i]);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/ast.h", line 4595
    void visit_ast(const ast_t &b) { visit_ast_t(b, self()); }
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/ast.h", line 4553
    case astType::unit: { v.visit_unit((const unit_t &)x); return; }
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/ast.h", line 4598
    void visit_mod(const mod_t &b) { visit_mod_t(b, self()); }
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/ast.h", line 4209
    case modType::BlockData: { v.visit_BlockData((const BlockData_t &)x); return; }
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/semantics/ast_body_visitor.cpp", line 650
    transform_stmts(body, x.n_body, x.m_body);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/semantics/ast_body_visitor.cpp", line 52
    this->visit_stmt(*m_body[i]);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/ast.h", line 4636
    void visit_stmt(const stmt_t &b) { visit_stmt_t(b, self()); }
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/ast.h", line 4306
    case stmtType::Assign: { v.visit_Assign((const Assign_t &)x); return; }
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/semantics/ast_body_visitor.cpp", line 754
    ASR::expr_t *target = LFortran::ASRUtils::EXPR(tmp);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/ast.h", line 4685
    void visit_expr(const expr_t &b) { visit_expr_t(b, self()); }
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/ast.h", line 4360
    case exprType::BoolOp: { v.visit_BoolOp((const BoolOp_t &)x); return; }
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/semantics/ast_common_visitor.h", line 1281
    visit_BinOp2(al, x, left, right, tmp, binop2str[x.m_op], current_scope);
  File "/Users/certik/repos/lfortran/lfortran/src/lfortran/semantics/ast_common_visitor.h", line 1157
    diag.add(Diagnostic(
semantic error: Type mismatch in binary operator, the types must be compatible
 --> examples/expr2.f90:6:5
  |
6 | x = (2+3)*5 + "x"
  |     ^^^^^^^   ^^^ type mismatch (integer and character)


Note: if any of the above error or warning messages are not clear or are lacking
context please report it to us (we consider that a bug that needs to be fixed).

The stacktrace code is implemented here: src/lfortran/stacktrace.cpp · 53ded5898225dc9279ae70a2e2537d5cd3a3d1e0 · lfortran / lfortran · GitLab. On Linux it can use the BFD library (from binutils), on macOS it can use either BFD, or the native “dwarfdump”, which is called at built time of LFortran here: src/bin/dwarf_convert.py · 53ded5898225dc9279ae70a2e2537d5cd3a3d1e0 · lfortran / lfortran · GitLab, it saves the debugging information to a text file to make it easy and quick to process to produce the stacktrace later.

I plan to use the same technology to produce stacktraces from the compiled Fortran codes themselves, but haven’t implemented it yet.

If anyone wants to collaborate on this aspect and help improve this, that would be awesome. It works on Linux and macOS, but I haven’t figured out a solution for Windows. That needs to still be added too.

Topic		Replies	Views
How to print a human-readable backtrace with `nvfortran`?	6	1149	December 3, 2022
Fortran error handling including stacktrace generation Announcements	8	1645	November 7, 2022
ErrorFx: Fortran exception library Announcements	30	1451	October 19, 2021
Only my Windows compilation fails? Help	18	1234	August 15, 2023
No debug in conda install lfortran on m1 mac? LFortran	2	346	January 25, 2024

Getting a backtrace

Related topics