Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f85ba9f2960 in ???
#1 0x7f85ba9f1ac5 in ???
#2 0x7f85ba6df51f in ???
at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3 0x7f85ba740ac1 in _int_malloc
at ./malloc/malloc.c:3937
#4 0x7f85ba742138 in __GI___libc_malloc
at ./malloc/malloc.c:3329
#5 0x55d084f35ac1 in __grisbolt_hamiltonian_MOD_build_h_sector_fulldiag
at ../../GRISBOLT/src/GRISBOLT_AIM/GRISBOLT_HAMILTONIAN.f90:60
#6 0x55d084f23c84 in __grisbolt_aim_MOD_diagonalize_aim
at ../../GRISBOLT/src/GRISBOLT_AIM/GRISBOLT_AIM.f90:117
#7 0x55d084f24e53 in __grisbolt_aim_MOD_solve_aim_problem
at ../../GRISBOLT/src/GRISBOLT_AIM/GRISBOLT_AIM.f90:48
#8 0x55d084ebca15 in sc_cycle
at ././src/BETHE_X_GRISBOLT.f90:153
#9 0x55d084eb76cd in __bethe_x_grisbolt_MOD_solve_bethe_x
at ././src/BETHE_X_GRISBOLT.f90:104
#10 0x55d084ea4a83 in MAIN__
at app/bethe_x.f90:103
#11 0x55d084ea4c44 in main
at app/bethe_x.f90:3
Segmentation fault (core dumped)
<ERROR> Execution for object " bethe_x " returned exit code 139
<ERROR> *cmd_run*:stopping due to failed executions
STOP 139
The error appear when allocating a non-allocated array in the following subroutine:
Where Nbath, Nimp and Nspin are module integer variables.
I traced back the error that appears in the last allocation, that is " allocate(ii_dw(dim_ii)) " .
I preventively deallocate it two lines before, I checked that dim_ii have the proper integer value.
Thank you in advance if you can help me!
I don’t see any issue at first sight. I would recommend creating a minimal reproducible example (MRE) of this. Either it becomes obvious what the problem is, or it’s a gfortran bug in which case you can report MRE to them.
If you add stat=istat, errmsg=msgstr clauses to allocate, with proper declaration of these two variables, will it prevent SegFault? And if so, what error message appears in msgstr?
Does the calling program have an explicit interface for this subroutine? I think that is necessary for allocatable dummy arguments. If there is a mismatch, then it can corrupt the allocation status of other arrays, even local ones.
If everything is working correctly, all of the local allocatable arrays in your subroutine should be deallocated upon entry, and automatically deallocated upon exit. Since you are testing the allocation status of some of those arrays and apparently finding some of them allocated upon entry, that is evidence that the allocation tables are being corrupted somehow.
I am checking the allocation but none of them is allocated, I did check by printing the values but I didn’t add it to the snipped of code I posted.
I added those lines as a safe measurement to make sure that the problem was not having a pre-allocated variable.
I tried to provide an explicit interface to the subroutine calling “build_H_sector_fulldiag” as follows:
subroutine diagonalize_aim(aim_problem,state_list,verb_,ifrag_)
interface
subroutine build_H_sector_fulldiag(aim_problem,SectorI,Hmat,ifrag_)
USE GRISBOLT_COMMON, only: AIM
USE GRISBOLT_FOCKSPACE, only: sector
type(AIM),allocatable,intent(in) :: aim_problem
type(sector) :: SectorI
complex(8),dimension(:,:) :: Hmat
integer,optional :: ifrag_
end subroutine build_H_sector_fulldiag
end interface
!> routine to find the GroundState(s) of the AIM
type(AIM),allocatable,intent(inout) :: aim_problem
type(sparse_espace), intent(inout) :: state_list
!
[rest of the code]
end subroutine
and now it fails to compile with the following error:
bethe_x.f90 done.
bethe_x failed.
[100%] Compiling...
/usr/bin/ld: build/gfortran_D60D5F445CDA73CD/BETHE_2ORB_GRISOLT/libBETHE_2ORB_GRISOLT.a(.._.._GRISBOLT_src_GRISBOLT_AIM_GRISBOLT_AIM.f90.o): in function `__grisbolt_aim_MOD_diagonalize_aim':
/home/samuele/GRISB/TESTS_GRISBOLT/BETHE_2ORB_GRISBOLT/../../GRISBOLT/src/GRISBOLT_AIM/GRISBOLT_AIM.f90:128: undefined reference to `build_h_sector_fulldiag_'
/usr/bin/ld: /home/samuele/GRISB/TESTS_GRISBOLT/BETHE_2ORB_GRISBOLT/../../GRISBOLT/src/GRISBOLT_AIM/GRISBOLT_AIM.f90:147: undefined reference to `build_h_sector_fulldiag_'
collect2: error: ld returned 1 exit status
<ERROR> Compilation failed for object " bethe_x "
<ERROR> stopping due to failed compilation
STOP 1
So it does not recognize it, maybe I am doing something wrong passing it.
Anyway before passing the interface the code was entering the subroutine.
From the …MOD… names referenced in the backtrace message, we know that build_H_sector_fulldiag is a module procedure. This means that posting “snippets” of code will not help us help you, because too much context is missing. The wheels may be coming off at the last ALLOCATE, but the damage is probably being done a lot earlier, in a different subprogram. I suggest using “-fcheck=all” to recompile the entire program. Also, does valgrind report anything suspicious? If that doesn’t help, come back.
Yes, I will try to create a MRE as soon as possible.
In the meanwhile, I am already using -fcheck=all and I used valgrind with the following options:
valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes -s fpm run bethe_x
And the leak summary + first two errors say:
==774496== LEAK SUMMARY:
==774496== definitely lost: 91,911 bytes in 3,877 blocks
==774496== indirectly lost: 71,794 bytes in 2,099 blocks
==774496== possibly lost: 454 bytes in 8 blocks
==774496== still reachable: 495,266 bytes in 5,980 blocks
==774496== suppressed: 0 bytes in 0 blocks
==774496==
==774496== ERROR SUMMARY: 596 errors from 538 contexts (suppressed: 0 from 0)
==774496==
==774496== 1 errors in context 1 of 538:
==774496== Conditional jump or move depends on uninitialised value(s)
==774496== at 0x4B0FCF2: _gfortran_execute_command_line_i4 (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==774496== by 0x13FE03: __fpm_filesystem_MOD_run (fpm_filesystem.F90:995)
==774496== by 0x1374F9: __fpm_MOD_cmd_run (fpm.f90:621)
==774496== by 0x115F36: MAIN__ (main.f90:78)
==774496== by 0x11549E: main (main.f90:13)
==774496== Uninitialised value was created by a stack allocation
==774496== at 0x13FCED: __fpm_filesystem_MOD_run (fpm_filesystem.F90:949)
==774496==
==774496==
==774496== 1 errors in context 2 of 538:
==774496== realloc() with size 0
==774496== at 0x48502F0: realloc (vg_replace_malloc.c:1801)
==774496== by 0x144D58: __fpm_filesystem_MOD_list_files (fpm_filesystem.F90:442)
==774496== by 0x16B16B: __fpm_sources_MOD_add_sources_from_dir (fpm_sources.f90:108)
==774496== by 0x134DBD: __fpm_MOD_build_model (fpm.f90:198)
==774496== by 0x1366DB: __fpm_MOD_cmd_run (fpm.f90:495)
==774496== by 0x115F36: MAIN__ (main.f90:78)
==774496== by 0x11549E: main (main.f90:13)
==774496== Address 0x5c0b720 is 0 bytes after a block of size 0 alloc'd
==774496== at 0x484880F: malloc (vg_replace_malloc.c:446)
==774496== by 0x144CC4: __fpm_filesystem_MOD_list_files (fpm_filesystem.F90:442)
==774496== by 0x16B16B: __fpm_sources_MOD_add_sources_from_dir (fpm_sources.f90:108)
==774496== by 0x134DBD: __fpm_MOD_build_model (fpm.f90:198)
==774496== by 0x1366DB: __fpm_MOD_cmd_run (fpm.f90:495)
==774496== by 0x115F36: MAIN__ (main.f90:78)
==774496== by 0x11549E: main (main.f90:13)
==774496==
I don’t understand if it is something that has to do with fpm, I may have to specify that I am using fpm 0.10.0 alpha
You are right! But something tricky is happening…
I was able to find the executable and I moved it to the main folder.
If I just run the executable I get the same SegFault error as the original post.
If I run it with valgrind the executable doesn’t stop at the same point and runs until the end without errors (and “does the job it is meant to do”).
Valgrind output anyway returns 23 errors from 9 contexts:
==784700== LEAK SUMMARY:
==784700== definitely lost: 408 bytes in 8 blocks
==784700== indirectly lost: 4,150 bytes in 90 blocks
==784700== possibly lost: 0 bytes in 0 blocks
==784700== still reachable: 34,264 bytes in 48 blocks
==784700== suppressed: 0 bytes in 0 blocks
==784700==
==784700== ERROR SUMMARY: 23 errors from 9 contexts (suppressed: 0 from 0)
==784700==
==784700== 1 errors in context 1 of 9:
==784700== Conditional jump or move depends on uninitialised value(s)
==784700== at 0x521A9FA: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==784700== by 0x10E838: MAIN__ (bethe_x.f90:98)
==784700== by 0x10EC1E: main (bethe_x.f90:3)
==784700== Uninitialised value was created by a stack allocation
==784700== at 0x521A8FE: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==784700==
==784700==
==784700== 8 errors in context 2 of 9:
==784700== Invalid write of size 8
==784700== at 0x121280: __bethe_x_grisbolt_MOD_solve_bethe_x (BETHE_X_GRISBOLT.f90:82)
==784700== by 0x10EA6F: MAIN__ (bethe_x.f90:103)
==784700== by 0x10EC1E: main (bethe_x.f90:3)
==784700== Address 0x7b01098 is 8 bytes after a block of size 64 alloc'd
==784700== at 0x484880F: malloc (vg_replace_malloc.c:446)
==784700== by 0x11EA6D: __bethe_x_grisbolt_MOD_solve_bethe_x (BETHE_X_GRISBOLT.f90:59)
==784700== by 0x10EA6F: MAIN__ (bethe_x.f90:103)
==784700== by 0x10EC1E: main (bethe_x.f90:3)
==784700==
==784700==
==784700== 8 errors in context 3 of 9:
==784700== Invalid write of size 8
==784700== at 0x121271: __bethe_x_grisbolt_MOD_solve_bethe_x (BETHE_X_GRISBOLT.f90:82)
==784700== by 0x10EA6F: MAIN__ (bethe_x.f90:103)
==784700== by 0x10EC1E: main (bethe_x.f90:3)
==784700== Address 0x7b01090 is 0 bytes after a block of size 64 alloc'd
==784700== at 0x484880F: malloc (vg_replace_malloc.c:446)
==784700== by 0x11EA6D: __bethe_x_grisbolt_MOD_solve_bethe_x (BETHE_X_GRISBOLT.f90:59)
==784700== by 0x10EA6F: MAIN__ (bethe_x.f90:103)
==784700== by 0x10EC1E: main (bethe_x.f90:3)
==784700==
==784700== ERROR SUMMARY: 23 errors from 9 contexts (suppressed: 0 from 0)
I feel you thank you for your help anyway!
Unfortunately bethe.f90:103 is just a call to a subroutine the “solve the problem my project is meant to solve”, I was hoping for a more fine-grained error output by valgrind
I found the bug! It was obviously a stupid mistake, I was writing a big matrix in a smaller one because I had hard-coded a dimension
Thank you for the support!
The Intel compiler ifx, the NAG compiler nagfor, and the LLVM compiler flang produce the required clearer message. fatal Fortran runtime error(arrbug.f90:11): Assign: mismatching element counts in array assignment (to 16, from 36)
program arr_bug
use iso_fortran_env, only : compiler_version, compiler_options
real , allocatable :: matrix_a(:,:,:)
real :: matrix_b(2,2), matrix_c(3,3)
print '(A,/,A)', compiler_version(), compiler_options()
allocate (matrix_a(2,4,4))
matrix_a = -1
matrix_b = 1
matrix_c = 1
matrix_a(1,:,:) = krone(matrix_b,matrix_c)
print *,matrix_a(1,:,:)
contains
function krone(x,y) result(z)
real :: x(:,:), y(:,:), z(size(x,dim=1)*size(y,dim=1),size(x,dim=2)*size(y,dim=2))
z = 42
end function krone
end program arr_bug