Thank you for your replies.
@gnikit, thanks for those links, they were very informative.
However (as I should have specified when I made my original post) I am building serial executables, and am running pw.x directly, without mpirun
, so I don’t need to do any MPI debugging.
Here is the relevant configuration from my launch.json
file:
{
"name": "(gdb) Debug pw.x with input file",
"type": "cppdbg",
"request": "launch",
"program": "${workspaceFolder}/bin/pw.x",
"args": ["-inp", "${workspaceFolder}/telzrow_test_files_for_pycdft/espresso.pwi"], // Possible input args for "program"
"stopAtEntry": false,
"cwd": "${workspaceFolder}",
"environment": [],
"externalConsole": false,
"MIMode": "gdb",
"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
}
],
"preLaunchTask": "Build pw.x"
}
The Build pw.x
task is defined in my tasks.json
file as follows:
{
"label": "Build pw.x",
"type": "shell",
"command": "make pw",
"presentation": {
"reveal": "always",
"panel": "new",
}
}
@FedericoPerini, thanks for your reply.
I apologize for not specifying this in my original post, but Quantum ESPRESSO uses wrappers around MPI functions, so that serial executables can be built.
So, even though I’m launching the program directly without using the mpiexec
command, I don’t believe this should be a problem.
Again, thank you both for your feedback.
Since my original post, I’ve spent quite a bit of time investigating this issue and I believe I’m closer to a solution:
The issue was actually appearing as soon as the open_input_file
function was called, which occurs right before that mb_bcast
call in the screenshot I posted originally.
I can set a breakpoint at any point in execution up to or including that open_input_file
line.
But if I set a breakpoint at any line executed afterwards, even at the very first line of that open_input_file
function, the issue appears.
I tried to debug the program directly, rather than using VSCode.
I set a breakpoint at line 48 of the read_input.f90
file, which is shown in the screenshot of my original post.
I also set a breakpoint at line 93 of open_close_input_file.f90
, which is the very first line of that open_input_file
function.
As you can see in the output below, the program stops at the first breakpoint, and I’m able to run the bt
gdb command.
The program continues to the next breakpoint, and I’m able to run the bt
gdb command again.
(When debugging with VSCode, this wouldn’t be possible: gdb would have already hung/crashed at this point.)
However, as you can see, when I run the info variables
command, gdb crashes with a segfault:
gdb bin/pw.x
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./bin/pw.x...
(gdb) break Modules/read_input.f90:48
Breakpoint 1 at 0x46f514: file read_input.f90, line 48.
(gdb) break Modules/open_close_input_file.f90:93
Breakpoint 2 at 0x5357b7: file open_close_input_file.f90, line 93.
(gdb) run
Starting program: /home/jamestelzrow/q-e/bin/pw.x
BFD: error: /usr/lib/debug/.build-id/b5/94dc721d75112eb9f2aa7a2c0ae957f373d962.debug(.debug_info) is too large (0x15ef54 bytes)
warning: Can't read data for section '.debug_info' in file '/usr/lib/debug/.build-id/b5/94dc721d75112eb9f2aa7a2c0ae957f373d962.debug'
warning: Section .debug_aranges in /usr/lib/debug/.build-id/b5/94dc721d75112eb9f2aa7a2c0ae957f373d962.debug entry at offset 0 debug_info_offset 0 does not exists, ignoring .debug_aranges.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program PWSCF v.7.2 starts on 3Nov2023 at 22:12:29
This program is part of the open-source Quantum ESPRESSO suite
for quantum simulation of materials; please cite
"P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
"P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
"P. Giannozzi et al., J. Chem. Phys. 152 154105 (2020);
URL http://www.quantum-espresso.org",
in publications or presentations arising from this work. More details at
http://www.quantum-espresso.org/quote
Serial version
29762 MiB available memory on the printing compute node when the environment starts
Breakpoint 1, read_input::read_input_file (prog=..., input_file_=..., _prog=_prog@entry=2, _input_file_=_input_file_@entry=256) at read_input.f90:48
48 IF ( ionode ) ierr = open_input_file( input_file_, xmlinput )
(gdb) bt
#0 read_input::read_input_file (prog=..., input_file_=..., _prog=_prog@entry=2, _input_file_=_input_file_@entry=256) at read_input.f90:48
#1 0x0000555555565c4b in pwscf () at pwscf.f90:84
(gdb) c
Continuing.
Breakpoint 2, open_close_input_file::open_input_file (input_file_=..., is_xml=.TRUE., _input_file_=_input_file_@entry=256) at open_close_input_file.f90:93
93 IF ( PRESENT(input_file_) ) THEN
(gdb) bt
#0 open_close_input_file::open_input_file (input_file_=..., is_xml=.TRUE., _input_file_=_input_file_@entry=256) at open_close_input_file.f90:93
#1 0x00005555559c361e in read_input::read_input_file (prog=..., input_file_=..., _prog=_prog@entry=2, _input_file_=_input_file_@entry=256) at read_input.f90:48
#2 0x0000555555565c4b in pwscf () at pwscf.f90:84
(gdb) info variables
All defined variables:
File ../csu/abi-note.c:
71: static const struct {
Elf64_Nhdr nhdr;
char name[4];
int32_t desc[4];
} __abi_tag;
File ../dlfcn/dlerror.h:
83: static struct dl_action_result * const dl_action_result_malloc_failed;
File ../login/utmp_file.c:
37: static int file_fd;
39: static off64_t file_offset;
38: static _Bool file_writable;
42: static struct utmp last_entry;
File ../nptl_db/db_info.c:
111: const uint32_t _thread_db_const_thread_area;
File ../nptl_db/structs.def:
82: const uint32_t _thread_db___nptl_initial_report_events[3];
80: const uint32_t _thread_db___nptl_nthreads[3];
84: const uint32_t _thread_db___pthread_keys[3];
98: const uint32_t _thread_db_dtv_dtv[3];
116: const uint32_t _thread_db_dtv_slotinfo_list_slotinfo[3];
95: const uint32_t _thread_db_link_map_l_tls_modid[3];
96: const uint32_t _thread_db_link_map_l_tls_offset[3];
66: const uint32_t _thread_db_list_t_next[3];
67: const uint32_t _thread_db_list_t_prev[3];
56: const uint32_t _thread_db_pthread_cancelhandling[3];
60: const uint32_t _thread_db_pthread_eventbuf[3];
61: const uint32_t _thread_db_pthread_eventbuf_eventmask[3];
62: const uint32_t _thread_db_pthread_eventbuf_eventmask_event_bits[3];
93: const uint32_t _thread_db_pthread_key_data_level2_data[3];
52: const uint32_t _thread_db_pthread_list[3];
63: const uint32_t _thread_db_pthread_nextevent[3];
53: const uint32_t _thread_db_pthread_report_events[3];
58: const uint32_t _thread_db_pthread_schedparam_sched_priority[3];
57: const uint32_t _thread_db_pthread_schedpolicy[3];
--Type <RET> for more, q to quit, c to continue without paging--c
(I'm omitting this section of the output because it is a very long list of variables that I don't believe is relevant)
File qexsd.f90:
integer(kind=8) _F.qexsd_module_MOD_clock_list;
Fatal signal: Segmentation fault
----- Backtrace -----
0x5645c7b4e40e ???
0x5645c7c57601 ???
0x5645c7c57776 ???
0x7feccb45afcf ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7feccb586618 __strlen_evex
../sysdeps/x86_64/multiarch/strlen-evex.S:79
0x5645c7ba7083 ???
0x5645c7c6c62e ???
0x5645c7c6ca74 ???
0x5645c7ec2341 ???
0x5645c7e54df2 ???
0x5645c7e57795 ???
0x5645c7e5c2b3 ???
0x5645c7e5c440 ???
0x5645c7b80c94 ???
0x5645c7e8e287 ???
0x5645c7c57e1c ???
0x5645c7c593cf ???
0x5645c7c586d1 ???
0x7feccc62246c ???
0x5645c7c587fd ???
0x5645c7c5898f ???
0x5645c7c57d0c ???
0x5645c803f1d5 ???
0x5645c803fcb2 ???
0x5645c7d212f9 ???
0x5645c7d22f74 ???
0x5645c7ab1ca9 ???
0x7feccb4461c9 __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
0x7feccb446284 __libc_start_main_impl
../csu/libc-start.c:360
0x5645c7ab8e30 ???
0xffffffffffffffff ???
---------------------
A fatal error internal to GDB has been detected, further
debugging is not possible. GDB will now terminate.
This is a bug, please report it. For instructions, see:
<https://www.gnu.org/software/gdb/bugs/>.
Segmentation fault
I’m going to report this crash to the gdb developers.
But am I correct in assuming that either the C/C++ or Modern Fortran plugin probably runs some similar crash-causing gdb command, and then silently crashes without displaying a warning to the user?