On Linux moving 32 bit program to 64 bit: 3 segmentation faults for invalid memory references (malloc?)

The program runs perfectly on 32 bit Linux. It compiles on 64 bit but gives the following errors.

I have four files. One of these (named STUFF) is included into the Fortran file named A.f. A.f calls routines in file B.c.

The third file (named B.c) is a C file that has three functions. If I don’t comment out most of the first two functions in this file and where I just leave the first “then clause” in each of the if-then-else statements I get a segmentation fault when I run the program.

If I only do this to the step2_ function where I only keep the “then clause” I get a segmentation fault in the step3_ function.

How can I keep the entire if-then-else clause of these two functions (step2_ and step3_)?

The first segmentation fault occurs on the fourth call to STEP2. The second segmentation fault occurs on the second call to STEP3.

I ask this because if I just keep the “then clause” body (throw away the if condition) and delete the “else clause” in each of these functions I get the same segmentation fault at step 5 which is called from the fourth file called C.f.

In the C code for step2_, the integer_adress varial print out as nil on the first three calls but some how is not nil on the fourth call.

I do not know if this code will compile. This pseudo code is to demonstrate the problem.

I tried changing floats to doubles in the C code but the segmentation fault persisted.

If I remove the first three function calls to STEP2 in file A.f the first segmentation fault does not occur. Nothing else in the code appears to be directly causing these faults. I.E.: Nothing else is changing these variables.

If I print out errno after calling malloc it prints out as 0.

The C file is used by the Fortran code via an archive file. The command outputs to create that archive file are:

cc -g -Wall -Werror -fmax-errors=1 -c -o B.o B.c
r - B.o
ar rvu ../lib.a
ranlib ../lib.a

The Fortran files are compiled with the following options:

-g -fno-second-underscore -Wall -Werror -fmax-errors=1 -fcheck=all

----- file A.f ----

      SUBROUTINE STEP1
      IMPLICIT NONE

      INCLUDE 'STUFF'

      INTEGER STEP2, STEP3

      C = 0
      if (STEP2(C,1).NE.0) print *,'not good'
      E =0
      if (STEP2(E,1).NE.0) print *,'not good'
      F = 0
      if (STEP2(F,1).NE.0) print *,'not good'
      B = 0
      if (STEP2(B,1).NE.0) print *,'not good'
C     Program received signal SIGSEGV: Segmentation 
C     fault - invalid memory reference.

      G = 0
      if (STEP3(G,1).NE.0) print *,'not good'
      D = 0
      if (STEP3(D,1).NE.0) print *,'not good'
C     Program received signal SIGSEGV: Segmentation 
C     fault - invalid memory reference.

      RETURN
      END

----- file B.c ----

struct my_data {
   int value;
};

struct my_data2 {
   int value;
};

  int step2_(integer_address,the_one)
  float **integer_address;
  register struct my_data *the_one;
{
    int return_value=0;

    if(!*integer_address)
      *integer_address=(float *)malloc(the_one->num*sizeof(float));
    else
      *integer_address=(float *)realloc(*integer_address,the_one->num*sizeof(float));

    if(!*integer_address)return_value=1;

  return(return_value);
}

  int step3_(integer_address,the_one)
  int **integer_address;
  register struct my_data *the_one;
{
    int return_value=0;

    if(!*integer_address)
      *integer_address=(int *)malloc(MAX(1, the_one->num)*sizeof(int));
    else
      *integer_address=(int *)realloc(*integer_address,MAX(1, the_one->num) *sizeof(int));

    if(!*integer_address)return_value=1;

  return(return_value);
}

  int step5_(dest,src)
  register struct my_data2 **dest,*src;
{
    (*dest)->value=src->value;
    /* Program received signal SIGSEGV: Segmentation fault - invalid memory reference.*/
  return;
}

----- file C.f ----

      SUBROUTINE STEP4(THEADDR)
      INTEGER THEADDR,Z
C ... ... ...
C
C With THEADDR being set to some large value 
C such as 1000000000 (this number is made up)

      DO 10 Z=1,100
       CALL STEP5(THEADDR+Z*4,0)
10    CONTINUE

      RETURN
      END

----- file STUFF ----

      INTEGER A
      
      INTEGER B,
     &        C,
     &        D,
     &        E,
     &        F,
     &        G
      
      INTEGER H
      INTEGER I

      COMMON/INTEGER_ADDR/H,I,A,D,G
      COMMON/REAL_ADDR/B,C,E,F

Your code does not compile, A.f and B.c as well. None of your 4 files contain main (PROGRAM) segment. So how can you tell The program runs perfectly on 32 bit Linux?

The program is too large to post to the Internet. I’ve tried to dissect out a small portion of the logic that I wanted to get across. I did not try to compile the code that I posted.

The program runs beautifully on a 32 bit Centos 7 Virtual Machine.

(1) I don’t know why the first segmentation fault occurs. Somehow the address is nil on the first three function calls to STEP2 and is not nil on the 4th call.

It can be a bit difficult to understand what is going on if we don’t have access to the full source code. Might I suggest you create a GitHub repo and add the link in this thread.

Assuming you can share the full source code in the first place.

It won’t compile even if put into bigger code. In B.c, you use non-existing my_data struct’s member new (should be value). In A.f you use unclassifiable statements like STEP2(C,1). If STEP2 is a subroutine, must be invoked as CALL STEP2(C,1). If it is a function, it cannot stay alone in the statement (that is possible in C but not in Fortran)

I’ve addressed some of your concerns in the original post. Here is the corrected file A.f:

      SUBROUTINE STEP1
      IMPLICIT NONE

      INCLUDE 'STUFF'

      INTEGER STEP2, STEP3

      C = 0
      if (STEP2(C,1).NE.0) print *,'not good'
      E =0
      if (STEP2(E,1).NE.0) print *,'not good'
      F = 0
      if (STEP2(F,1).NE.0) print *,'not good'
      B = 0
      if (STEP2(B,1).NE.0) print *,'not good'
C     Program received signal SIGSEGV: Segmentation 
C     fault - invalid memory reference.

      G = 0
      if (STEP3(G,1).NE.0) print *,'not good'
      D = 0
      if (STEP3(D,1).NE.0) print *,'not good'
C     Program received signal SIGSEGV: Segmentation 
C     fault - invalid memory reference.

      RETURN
      END

This statement is not valid in Fortran unless STEP2 is a function of type LOGICAL, and ‘STUFF’ contains a declaration to that effect. Since STEP2 is implemented as a C function of type integer, your code is invalid and all kinds of things can happen if you manage to build and run an EXE with such faulty code.

This is still wrong, since STEP2 is considered to be a function of type default REAL, and interpreting an integer value as an IEEE-32 real is probably not meaningful in your application.

In all honesty people, the code was originally posted as pseudo code. Do you know what pseudo code is? I’ll update the pseudo code again.

I just don’t think the problem has anything to do with a syntactical problem. I may be wrong and would like to know. There are no warnings at all from the compiler.

Sorry for any confusion

I don’t know if this is my problem or not, but is someone here capable of talking about the differences between coding Fortran and C on an old 32 bit machine versus a relatively new 64 bit machine?

Oh, fun with pointers.

First off, I would highly recommend to use iso_c_binding kind parameters and bind(c) attributes, as this would allow you to write routines compatible for both 32bit and 64bit Linux. If you use the c_intptr_t from the iso_c_binding module to declare your common block integers you will have enough space to store a 64bit wide pointer address.

The segfault you are observing is most likely a result from attempting to store a pointer address in a default integer (usually 32bit wide), accidentally overwriting adjacent entries in the common block and than later trying to free a completely different location in memory.

You are relying on a lot more platform specifics in the posted example snippets. Getting those sorted provides plenty of learning opportunities, both in C and Fortran ;).

I’ll look into your suggestions.

Besides the above C code, how do I fix this in code such as below? What can I try? What are my options? What can you teach me? What will be my indirect problems?

How can I change my entire program?

Do you have a good, practical, and intuitive reference on the Internet?

struct my_data2 {
   int value;
};

  int step5_(dest,src)
  register struct my_data2 **dest,*src;
{
    (*dest)->value=src->value;
    /* Program received signal SIGSEGV: Segmentation fault - invalid memory reference.*/
  return;
}

Lot’s of questions. Given that the project is proprietary, have you considered hiring a consultant to fix your code?

The wisdom I can share is the same as above, have a look into C/Fortran interop and declare your interoperable procedures properly.

I’m afraid nobody can help you without knowing more about the code being fixed. From the snippets you published, it contains lot’s of errors, including such that are guaranteed to crash in 64-bit. E.g. your step5_ C-function above is invoked by following Fortran code:

      INTEGER THEADDR,Z
C ... ... ...
C
C With THEADDR being set to some large value 
C such as 1000000000 (this number is made up)

      DO 10 Z=1,100
       CALL STEP5(THEADDR+Z*4,0)
10    CONTINUE

The first argument of the call is INTEGER (usually 4-byte) and the offset (at least it looks like an offset) is apparently calculated as Z*4, again suggesting 4-bytes/32-bits entities. If you compile that in 64-bits, it just cannot work, as the step5_ function in C interprets its first argument as double pointer, so the values of THEADDR+Z*4 should be actually pointers (addresses), not integer values. It is unlikely that this can be properly done in Fortran, unless THEADDR itself is a pointer somehow imported from another C function. Even so, it is 32-bit value, unable to represent a 64-bit pointer required by step5_

Yes, “THEADDR+Z*4” is an offset.

THEADDR is not a pointer. It’s an integer. It gets set to an address.

Can you explain to me why the exact same code (with no modifications) runs very well on a Centos 7 32 Bit Linux Virtual Machine? In the above code posting, I merely reduced the code to its most critical instructions. I admit. I’ve made errors because I reduced the code to a pseudo test case to demonstrate what I’m seeing on a 64 bit machine. I’ve tried to correct my errors when I could.

I’ve traced through the 32 bit code using print statements before and after each instruction. At every stage, the 32 bit code runs perfectly. It produces values that you would expect. It produces a final result as expected.

Why is there a discrepancy between the 32 bit and 64 bit (RHEL 7 Virtual Machine) run-time behavior? I’d like to know.

I did explain, I guess. The integer THEADDR is interpreted (by C routine) as an address (pointer). In 32-bit environment, this may work. In 64-bit, you need a 64-bit entity to represent an address/pointer. And THEADDR is in typical implementation, 32-bit value both in 32 and 64-bit environment. So in 64-bit env, spec5_ functions takes 32 bits of the address from THEADDR+offset value and other 32 bits from a place in memory with random value.

1 Like

valgrind is pretty good at diagnosing segmentation faults.

1 Like

The code is not standard-conforming and uses implementation-specific features, so do not blame the compiler. It is probably fixable by someone knowing the implementation, Fortran and the whole code.

BTW, why would you force switching to 64-bit if 32-bit version works. You can run 32-bit applications in 64-bit Linux

1 Like

Indeed, Fortran compilers will probably never stop supporting standard compliant code and it will be portable across platforms and architectures. However, the shown code snippets are not standard compliant and you are relying on architecture specifics (e.g. width of pointer addresses).

The way forward for you would be to make your code standard compliant and replace architecture specific constants and variables. I think all apparent issues in the posted snippets were already spotted and discussed in this thread, including suggestions how to fix them.

Sure, someone can probably fix it, might be a bit challenging but certainly not impossible. Not sure if someone will volunteer their time to fix it for you.

2 Likes

I don’t exactly understand what you mean by “standard compliant.”

I spent a week using the compiler options to remove every single warning message. There are no longer warnings produced in the compilation of the program on the 64bit machine.

How do I make the code changes you talk of? Do you have code to show me that runs? I’ve never done this before and would like to learn.