On Linux moving 32 bit program to 64 bit: 3 segmentation faults for invalid memory references (malloc?)

The program runs perfectly on 32 bit Linux. It compiles on 64 bit but gives the following errors.

I have four files. One of these (named STUFF) is included into the Fortran file named A.f. A.f calls routines in file B.c.

The third file (named B.c) is a C file that has three functions. If I don’t comment out most of the first two functions in this file and where I just leave the first “then clause” in each of the if-then-else statements I get a segmentation fault when I run the program.

If I only do this to the step2_ function where I only keep the “then clause” I get a segmentation fault in the step3_ function.

How can I keep the entire if-then-else clause of these two functions (step2_ and step3_)?

The first segmentation fault occurs on the fourth call to STEP2. The second segmentation fault occurs on the second call to STEP3.

I ask this because if I just keep the “then clause” body (throw away the if condition) and delete the “else clause” in each of these functions I get the same segmentation fault at step 5 which is called from the fourth file called C.f.

In the C code for step2_, the integer_adress varial print out as nil on the first three calls but some how is not nil on the fourth call.

I do not know if this code will compile. This pseudo code is to demonstrate the problem.

I tried changing floats to doubles in the C code but the segmentation fault persisted.

If I remove the first three function calls to STEP2 in file A.f the first segmentation fault does not occur. Nothing else in the code appears to be directly causing these faults. I.E.: Nothing else is changing these variables.

If I print out errno after calling malloc it prints out as 0.

The C file is used by the Fortran code via an archive file. The command outputs to create that archive file are:

cc -g -Wall -Werror -fmax-errors=1 -c -o B.o B.c
r - B.o
ar rvu ../lib.a
ranlib ../lib.a

The Fortran files are compiled with the following options:

-g -fno-second-underscore -Wall -Werror -fmax-errors=1 -fcheck=all

----- file A.f ----

      SUBROUTINE STEP1
      IMPLICIT NONE

      INCLUDE 'STUFF'

      INTEGER STEP2, STEP3

      C = 0
      if (STEP2(C,1).NE.0) print *,'not good'
      E =0
      if (STEP2(E,1).NE.0) print *,'not good'
      F = 0
      if (STEP2(F,1).NE.0) print *,'not good'
      B = 0
      if (STEP2(B,1).NE.0) print *,'not good'
C     Program received signal SIGSEGV: Segmentation 
C     fault - invalid memory reference.

      G = 0
      if (STEP3(G,1).NE.0) print *,'not good'
      D = 0
      if (STEP3(D,1).NE.0) print *,'not good'
C     Program received signal SIGSEGV: Segmentation 
C     fault - invalid memory reference.

      RETURN
      END

----- file B.c ----

struct my_data {
   int value;
};

struct my_data2 {
   int value;
};

  int step2_(integer_address,the_one)
  float **integer_address;
  register struct my_data *the_one;
{
    int return_value=0;

    if(!*integer_address)
      *integer_address=(float *)malloc(the_one->num*sizeof(float));
    else
      *integer_address=(float *)realloc(*integer_address,the_one->num*sizeof(float));

    if(!*integer_address)return_value=1;

  return(return_value);
}

  int step3_(integer_address,the_one)
  int **integer_address;
  register struct my_data *the_one;
{
    int return_value=0;

    if(!*integer_address)
      *integer_address=(int *)malloc(MAX(1, the_one->num)*sizeof(int));
    else
      *integer_address=(int *)realloc(*integer_address,MAX(1, the_one->num) *sizeof(int));

    if(!*integer_address)return_value=1;

  return(return_value);
}

  int step5_(dest,src)
  register struct my_data2 **dest,*src;
{
    (*dest)->value=src->value;
    /* Program received signal SIGSEGV: Segmentation fault - invalid memory reference.*/
  return;
}

----- file C.f ----

      SUBROUTINE STEP4(THEADDR)
      INTEGER THEADDR,Z
C ... ... ...
C
C With THEADDR being set to some large value 
C such as 1000000000 (this number is made up)

      DO 10 Z=1,100
       CALL STEP5(THEADDR+Z*4,0)
10    CONTINUE

      RETURN
      END

----- file STUFF ----

      INTEGER A
      
      INTEGER B,
     &        C,
     &        D,
     &        E,
     &        F,
     &        G
      
      INTEGER H
      INTEGER I

      COMMON/INTEGER_ADDR/H,I,A,D,G
      COMMON/REAL_ADDR/B,C,E,F

Your code does not compile, A.f and B.c as well. None of your 4 files contain main (PROGRAM) segment. So how can you tell The program runs perfectly on 32 bit Linux?

It can be a bit difficult to understand what is going on if we don’t have access to the full source code. Might I suggest you create a GitHub repo and add the link in this thread.

Assuming you can share the full source code in the first place.

It won’t compile even if put into bigger code. In B.c, you use non-existing my_data struct’s member new (should be value). In A.f you use unclassifiable statements like STEP2(C,1). If STEP2 is a subroutine, must be invoked as CALL STEP2(C,1). If it is a function, it cannot stay alone in the statement (that is possible in C but not in Fortran)

This statement is not valid in Fortran unless STEP2 is a function of type LOGICAL, and ‘STUFF’ contains a declaration to that effect. Since STEP2 is implemented as a C function of type integer, your code is invalid and all kinds of things can happen if you manage to build and run an EXE with such faulty code.

This is still wrong, since STEP2 is considered to be a function of type default REAL, and interpreting an integer value as an IEEE-32 real is probably not meaningful in your application.

Oh, fun with pointers.

First off, I would highly recommend to use iso_c_binding kind parameters and bind(c) attributes, as this would allow you to write routines compatible for both 32bit and 64bit Linux. If you use the c_intptr_t from the iso_c_binding module to declare your common block integers you will have enough space to store a 64bit wide pointer address.

The segfault you are observing is most likely a result from attempting to store a pointer address in a default integer (usually 32bit wide), accidentally overwriting adjacent entries in the common block and than later trying to free a completely different location in memory.

You are relying on a lot more platform specifics in the posted example snippets. Getting those sorted provides plenty of learning opportunities, both in C and Fortran ;).

Lot’s of questions. Given that the project is proprietary, have you considered hiring a consultant to fix your code?

The wisdom I can share is the same as above, have a look into C/Fortran interop and declare your interoperable procedures properly.

I’m afraid nobody can help you without knowing more about the code being fixed. From the snippets you published, it contains lot’s of errors, including such that are guaranteed to crash in 64-bit. E.g. your step5_ C-function above is invoked by following Fortran code:

      INTEGER THEADDR,Z
C ... ... ...
C
C With THEADDR being set to some large value 
C such as 1000000000 (this number is made up)

      DO 10 Z=1,100
       CALL STEP5(THEADDR+Z*4,0)
10    CONTINUE

The first argument of the call is INTEGER (usually 4-byte) and the offset (at least it looks like an offset) is apparently calculated as Z*4, again suggesting 4-bytes/32-bits entities. If you compile that in 64-bits, it just cannot work, as the step5_ function in C interprets its first argument as double pointer, so the values of THEADDR+Z*4 should be actually pointers (addresses), not integer values. It is unlikely that this can be properly done in Fortran, unless THEADDR itself is a pointer somehow imported from another C function. Even so, it is 32-bit value, unable to represent a 64-bit pointer required by step5_

I did explain, I guess. The integer THEADDR is interpreted (by C routine) as an address (pointer). In 32-bit environment, this may work. In 64-bit, you need a 64-bit entity to represent an address/pointer. And THEADDR is in typical implementation, 32-bit value both in 32 and 64-bit environment. So in 64-bit env, spec5_ functions takes 32 bits of the address from THEADDR+offset value and other 32 bits from a place in memory with random value.

1 Like

valgrind is pretty good at diagnosing segmentation faults.

1 Like

The code is not standard-conforming and uses implementation-specific features, so do not blame the compiler. It is probably fixable by someone knowing the implementation, Fortran and the whole code.

BTW, why would you force switching to 64-bit if 32-bit version works. You can run 32-bit applications in 64-bit Linux

1 Like

Indeed, Fortran compilers will probably never stop supporting standard compliant code and it will be portable across platforms and architectures. However, the shown code snippets are not standard compliant and you are relying on architecture specifics (e.g. width of pointer addresses).

The way forward for you would be to make your code standard compliant and replace architecture specific constants and variables. I think all apparent issues in the posted snippets were already spotted and discussed in this thread, including suggestions how to fix them.

Sure, someone can probably fix it, might be a bit challenging but certainly not impossible. Not sure if someone will volunteer their time to fix it for you.

2 Likes

“standard compliant” means that all constructs and operations are performed in accordance with the standard. Not all violations of the standard can be identified at compile time. Not to mention if your interfaces “lie” to the compiler.

One of the key violations identified is that you are passing a Fortran integer (default integer, which is typically 32 bits), to a C function which expects a pointer (in Fortran type(c_ptr)). You either haven’t supplied an explicit interface at the call site, in which case the compiler can’t even check that you are calling it correctly, or have “lied” about what the C interface actually looks like.

This “works” on a 32 bit machine, because in that environment C pointers are actually 32 bit entities. When you move to a 64 bit machine, the pointers are now 64 bit, but you haven’t switched to that on the Fortran side.

The proper fix for this situation is to go carefully define the interfaces to the C procedures on the Fortran side, making sure to use things like type(c_ptr), integer(kind=c_int), etc.

You may also find my YouTube video on this topic helpful.

2 Likes

Concerning this part, I’d also be worried about integer overflow (wraparound) in case THEADDR is even larger than you say. Assuming your Fortran compiler defaults to 32-bit integers, the largest value representable is 2147483647.

It would also be interesting to know how that THEADDR is created/initialized up in the calling sequence (AFAIR it is a dummy argument at this very place) so that it can serve as a pointer in the C function that it is passed (with the offset) as an argument.

C pointers and Fortran integers are not the same thing. Using them interchangeably is not standards compliant. You have to go change all the declarations where a variable is used as a C pointer to type(c_ptr).

FYI, I am available for consulting work if you do want actual help working on the code.

1 Like

My recommendations are

  • Metcalf, Reid, Cohen, Modern Fortran Explained, Oxford University Press, 2018, or
  • Milan Curcic, Modern Fortran, Manning, 2020. (you may be able to get a free preview of chapter 11 on interoperability with C)
1 Like

Discussing interoperability is on-topic here, but it is difficult to give specific advice without seeing code.

The iso_c_binding intrinsic module was introduced in the Fortran 2003 standard. C interoperability is important for Fortran, and all currently maintained compilers should have the module. The definitions of intrinsic procedures and named constants in the module are unlikely to change. When compilers break backward compatibility because the Fortran standard has removed a feature, they have an option to enable the old behavior. For example, gfortran has the -std=legacy option.

This will not work. You do not have an explicit interface for call_it. The implicit none will probably catch you in this case and the compiler will complain, since it won’t be able to assume the return type of the function. If you add an explicit interface it might work as expected, but the call_it function is completely unnecessary in this case. Just call func directly.

SUBROUTINE foobar ()
         interface
           function func(arg) result(res) bind(C, name='func')
             import :: c_int
             integer(c_int), intent(in), value :: arg
             integer(c_int) :: res
           end function
         end interface
         INTEGER(KIND=C_INT) :: i
     
         ! Use it.
         DO i = 1_C_INT, 10_C_INT
           PRINT *, func (i)
         END DO
END SUBROUTINE foobar