How to print a human-readable backtrace with `nvfortran`?

How to print a human-readable backtrace with nvfortran when an exception occurs?

First of all, let me acknowledge that this question is better asked on the NVIDIA Developer Forum. However, I cannot log in the forum because the verification email takes an eternity to arrive. So I try here in case someone knows the answer.

Here is a minimal example.

! test.f90                                                                                                                     
program test                                                                                                                         
implicit none                                                                                                                        
                                                                                                                                     
real :: a                                                                                                                            
                                                                                                                                     
a = tiny(a)                                                                                                                          
                                                                                                                                     
write (*, *) 1.0 / (a**2)                                                                                                                                                                                    
                                                                                                                                     
end program test                                                                                                                     

According to the nvfortran manual, I do the following.

$ uname -a && nvfortran --version 

Linux 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

nvfortran 22.11-0 64-bit target on x86-64 Linux -tp zen2 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

$ export NVCOMPILER_TERM=trace && nvfortran -g -Kieee -Ktrap=divz test.f90 && ./a.out 

Error: floating point exception, floating point divide by zero
   rax 0x0000000000000000, rbx 0x00007fff3100e178, rcx 0x0000000000000000
   rdx 0x0000000000000000, rsp 0x00007fff3100e030, rbp 0x00007fff3100e040
   rsi 0x00007fb93c4fbf50, rdi 0x0000000000000006, r8  0x0000000000004240
   r9  0x0000000000004100, r10 0x00007fb93be156d0, r11 0x00007fb93bee3640
   r12 0x00007fff3100e178, r13 0x0000000000401200, r14 0x0000000000403d98
   r15 0x00007fb93c573040
  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fb93a41a520]
  ./a.out() [0x401233]
  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fb93a401d90]
  /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fb93a401e40]
  ./a.out() [0x401125]

Am I supposed to read and understand the above-printed backtrace or did I overlook something?

Thanks.

@zaikunzhang , have you made an inquiry at the Discord site for Nvidia’s HPC compilers with suitable tag(s) for nvfortran?

Hello @FortranFan , I did (want to) try, but did not succeed in logging in. Done, but no response received yet.

Unless you have a typo in your example nvfortran uses -traceback not -trace. Also, try -O0 -g. You best bet though is to let Ubuntu generate a core file and use gdb.

Thanks @rwmsu .

Sorry, I do not think there is a typo. I searched through the nvfortran manual but found no introduction to -traceback. Nevertheless, I tried it and got the following.

$ export NVCOMPILER_TERM=trace && nvfortran -g -O0 -traceback -Kieee -Ktrap=divz nv.f90 && ./a.out 
Error: floating point exception, floating point divide by zero
   rax 0x0000000000000000, rbx 0x00007ffc58a2d398, rcx 0x0000000000000000
   rdx 0x0000000000000000, rsp 0x00007ffc58a2d250, rbp 0x00007ffc58a2d260
   rsi 0x00007fc0194fbf50, rdi 0x0000000000000006, r8  0x0000000000004240
   r9  0x0000000000004100, r10 0x00007fc018e156d0, r11 0x00007fc018ee3640
   r12 0x00007ffc58a2d398, r13 0x0000000000401200, r14 0x0000000000403d98
   r15 0x00007fc019662040
  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fc01741a520]
  ./a.out(MAIN_: MAIN_ at /home/zaikunzhang/tmp/nv.f90:9) [0x401308]
  ./a.out(main+0x33) [0x401233]
  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fc017401d90]
  /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fc017401e40]
  ./a.out(_start+0x25) [0x401125]

In case you said about the setting of NVCOMPILER_TERM=trace, I just followed the following section of the manual.

I agree that gdb + core would be a good solution. However, since my tests are conducted remotely via GitHub Actions in a non-interactive way, gdb is not really usable. I can only access the log files of the tests. It would be ideal if the compiler could print a human-readable backtrace (which can be recorded in the log), as most compilers do when invoked with -g.

I guess nvfortran also intends to do so, but either I have not found the correct way of invoking it, or the backtrace is not really human-readable (or I am not considered a human …).

Note that with the -traceback option one of the lines now identifies the file and line that the error occurred on, which is typically all you get from a traceback.

I found documentation for nvfortran that indicated that was the default, but would be turned off if certain compiler optimizations were invoked, but that does not seem to be correct from your results.

My own scripts and fpm --profile debug both add it:

-Minform=inform -Mbackslash -Mbounds -Mchkptr -Mchkstk -traceback

Note that gdb(1) has many options for running in batch mode, and you can launch the application with gdb. Without trying it (so might be typos, but close) something like

gdb -batch -ex run -ex where -ex list -args "COMMAND ARGUMENTS" 

may be enough. The -x option allows you to set up much more elaborate scripts. I used a script that was used in batch scripts for years that did something along those lines; but there was a trick to short-circuit it when it succeeded so subsequent commands were not run when the run was successful that I do not remember at the moment. The gdb(1) web page almost certainly has examples of running in batch. Assuming it is rerunnable and has a short run time on Linux/Unix
you can also say to run command and then only run it again if the initial run fails quite simply as
well, using something like

CMD ARGS || gdb CMD -args ARGS -ex run -ex where -batch

as an alternative to allowing and processing a dump. The details can vary, bugt gdb(1) is definitely usable in batch mode.

The new line in your output

/a.out(MAIN_: MAIN_ at /home/zaikunzhang/tmp/nv.f90:9) [0x401308]

now indicates to look at line nine in file nv.f90.

PS:

It was a long time ago I first heard “To err is Human. To really foul things up requires a computer.”. That has really held up well over the years.

1 Like

Sorry, I overlooked this particular line. It is sufficient for my use.

Thank you @urbanjost very much for taking time to introduce the batch mode of gdb. This is very useful and I will try it in my tests later. Thanks.