# Performance, C vs. Fortran

I recently found some old code I had used to solve the Poisson equation with a somewhat naive double loop relaxation method on a 2d grid. The C code is as follows:

``````#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>

void swap(double a, double b, int n)
{
double swp;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
swp[i][j] = a[i][j];
a[i][j] = b[i][j];
b[i][j] = swp[i][j];
}
}
}
double mmax(double phi, double phip, int n)
{
double max = 0.0;
double diff = 0.0;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
diff = abs(phi[i][j]-phip[i][j]);
if (diff > max)
max = diff;
}
}
return max;
}
double rho(double x, double y)
{
double s1 = 0.6;
double e1 = 0.8;
double s2 = 0.2;
double e2 = 0.4;

if (x > s1 && x < e1 && y > s1 && y < e1) {
return 1.0;
} else if (x > s2 && x < e2 && y > s2 && y < e2 ) {
return -1.0;
} else {
return 0.0;
}
}
void run(double toler, double a)
{
double epsilon0 = 8.85e-12;
double phi;
double phip;
memset(phip, 0, sizeof(phip)*101*101);
memset(phi, 0, sizeof(phi)*101*101);

double delta = 1.0;

while (delta > toler) {
for (int i=1; i < 100; i++) {
for (int j=1; j < 100; j++) {
phip[i][j] = (phi[i+1][j] + phi[i-1][j] +
phi[i][j+1] + phi[i][j-1])/4 +
pow(a,2.0)/(4.0 * epsilon0)*rho(i * a, j * a);
}
}
delta = mmax(phi, phip, 101);
swap(phi, phip, 101);

}

}

int main(int argc, char *argv[])
{
clock_t start = clock();
run(1e-6, 0.01);
clock_t end = clock();
double total = ((double)(end - start)) / CLOCKS_PER_SEC;
printf("Execution time: %f\n",total);
}
``````

I decided to write it in Fortran to see how it does performance wise. And the Fortran code is as follows:

``````module rhofunc
implicit none
public
integer, parameter :: dp=kind(0.d0)
contains
pure real(dp) function rho(x,y)
real(dp), intent(in) :: x,y
if (x > 0.6_dp .and. x < 0.8_dp .and. y > 0.6_dp .and. y<0.8) then
rho = 1.0_dp
else if (x> 0.2_dp .and. x<0.4_dp .and. y>0.2_dp .and. y<0.4_dp) then
rho = -1.0_dp
else
rho = 0.0_dp
end if
end function

end module

program poisson
use rhofunc, only: rho
implicit none
integer, parameter :: dp=kind(0.d0), M=100
integer            :: i,j
real(dp),parameter :: epsilon0=8.85e-12_dp, target=1e-6, a=0.01
real(dp)           :: delta, start, end, phiprime(M,M), phi(M,M), temp(M,M), a2

delta = 1.0_dp

call cpu_time(start)

do while (delta > target )
a2 = a**2.0_dp
do i=2, M-1
do j=2, M-1
phiprime(i,j) = (phi(i+1,j) + phi(i-1,j) + phi(i,j+1) + phi(i,j-1))/4.0_dp &
+ a2/4.0_dp/epsilon0*rho(i*a,j*a)
end do
end do
delta = maxval(abs(phiprime - phi))
temp = phi
phi = phiprime
phiprime = temp

end do
call cpu_time(end)
print *, end-start
end program
``````

On my machine, the C code runs in 0.12 seconds, but the Fortran code takes 0.37 seconds. Is there a particular reason why the C code is faster here? (I’m able to somewhat improve C performance by doing some unholy pointer tricks when swapping to 0.09 seconds)

Both are single-file, so I simply used gcc/gfortran with the -Ofast optimization flag. Edit: I also ran both with WSL2 on Windows.

2 Likes

I got similar results. Maybe Fortran programs have a longer start-up time – could you increase the dimension of your problem to 1000 or 10000?

Something odd is going on, since if I increase the grid size to 250x250, the C version takes 0.6 seconds to run (about expected, since the increase in iterations is about 6x), whereas the fortran version now takes 30 seconds, a nearly 100x increase in time. This seems extremely odd to me. To test 1000x1000, one would have to allocate on heap for the C version.

Aren’t Fortran arrays column major order, but C arrays are row-major order? If so, this should be an inefficient iteration scheme, compared to C, right? Maybe the compiler takes care of this for you though

4 Likes

It seems the compiler took care of that anyway. Actually, swapping the loop variables around seems to make the code even slower, somehow. I’m pretty sure there must be something trivial I’m missing, because that also makes little sense.

Also, I noted that the Fortran code goes through something like 15x more iterations than the C code, and I don’t see why that happens either. Even accounting for that, though, C is considerably faster per iteration.

I wonder what I’m missing here.

Isn’t `phi(:,:)` in the Fortran code not initialized to zero before entering the main loop?
Also, it may be useful to check the number of iterations used in the C and Fortran codes (which should be the same or similar if the two calculations are essentially “equivalent”…)

Yeah, I fixed the initialization, but it didn’t change anything. I also ninja edited just as you were about to reply that the Fortran code does many more loops than the C code, but I can’t figure out why. I imagine that it will be something rather stupid in the end. Also, even taking in to account the number of iterations, the C code seems to be faster per iteration.

A difference I see is that abs() in C is returning an int. Would it be possible to rewrite the code using array slicing? That is once you verify it is getting correct results.

1 Like

i can’t check the results, but some code like this.

``````real(dp) :: rho_mat(M,M)
``````
``````do i=2, M-1
do j=2, M-1
rho_mat(i,j) = rho(i*a,j*a)
enddo
enddo
``````
``````do while (delta > target )
a2 = a**2.0_dp
phiprime(2:M-1,2:M-1) = (phi(3:M,2:M-1) + phi(1:M-2,2:M-1) + phi(2:M-1,3:M) + phi(2:M-1,1:M-2))/4.0_dp &
+ a2/4.0_dp/epsilon0*rho_mat(2:M-1,2:M-1)
delta = maxval(abs(phiprime - phi))
print *, delta
temp = phi
phi = phiprime
phiprime = temp
enddo
``````

See lower post for an example that is easier to read.

A difference I see is that abs() in C is returning an int.

Oh… After fixing this (plus some more modifications like declaring arrays as `phi(0:M)` with M=100), I got the same number of iterations. Interestingly, gfortran seems to give much slower speed with `-Ofast` than `-O3` for this code, for some reason…

(Also, `epsilon0 = 8.85e-12` and `a = 0.01` may possibly give some numerical issues, because of `a**2 / epsilon0` and `rho()` = a step-like function)? If I change `epsilon0` to, say, 8.85e-8, -O3 and -Ofast gave the same number of iterations.)

1 Like

C short-circuits evaluation of the logical expressions in `rho` but Fortran does not (and there a lot of terms to evaluate). I wonder if that might be a source of some of the performance difference.

Here’s what I eventually ended up with. Maybe you can see what you think. It has close to the same number of iterations by using the new suggested epsilon and putting fabs in the C code.

``````module rhofunc
implicit none
public
integer, parameter :: dp = kind(0.d0)
contains
pure real(dp) function rho(x, y)
real(dp), intent(in) :: x, y
if(x > 0.6_dp .and. x < 0.8_dp .and. y > 0.6_dp .and. y < 0.8) then
rho = 1.0_dp
else if(x > 0.2_dp .and. x < 0.4_dp .and. y > 0.2_dp .and. y < 0.4_dp) then
rho = -1.0_dp
else
rho = 0.0_dp
endif
end function

end module

program poisson
use rhofunc, only: rho
implicit none
integer, parameter :: dp = kind(0.d0), M = 100
integer :: i, j
real(dp), parameter :: epsilon0 = 3.85e-8_dp, target = 1e-6, a = 0.01
real(dp) :: delta, start, end, phiprime(0:M+1, 0:M+1), phi(0:M+1, 0:M+1), a2
integer :: iters
real(dp) :: rho_mat(M, M)

iters = 0
phi = 0.0_dp

delta = 1.0_dp

call cpu_time(start)

do i = 1, M
do j = 1, M
rho_mat(i, j) = rho(i*a, j*a)
enddo
enddo

a2 = a**2.0_dp

do while(.true.)
phiprime(1:M, 1:M) = (phi(2:M+1, 1:M) + phi(0:M-1, 1:M) + phi(1:M, 2:M+1) + phi(1:M, 0:M-1))/4.0_dp &
+ a2/4.0_dp/epsilon0*rho_mat
delta = maxval(abs(phiprime - phi))
iters = iters + 1
if(delta > target) then
phi = phiprime
else
exit
endif
enddo

call cpu_time(end)
print *, iters
print *, end - start
end program
``````

``````phiprime(1:M, 1:M) = (phi(2:M+1, 1:M) + phi(0:M-1, 1:M) + phi(1:M, 2:M+1) + phi(1:M, 0:M-1))/4.0_dp &
+ a2/4.0_dp/epsilon0*rho_mat
``````

you will have exculde the “center part” as well since the original post says

``````(phi(i+1,j) + phi(i-1,j) + phi(i,j+1) + phi(i,j-1))/4.0_dp + ...
``````

that means `phi(i, j)` is not included.

1 Like

I was thinking that would be

``````phi(1:M, 1:M)
``````

Is that not right?

You’re right, I thought the array starts at 1, but it’s `phi(0:M + 1)`

How does this version compare against C?

Assuming it’s correct, though it doesn’t have the exact same number of iterations, the Fortran version is faster than the C version on my laptop with very simple settings.

gcc -O3 poisson.c -lm && ./a.out
Iterations:, 15515
Execution Time: 0.358088

gfortran -O3 poisson.f90 && ./a.out
Iterations: 15889
Execution Time: 0.22557400000000000

1 Like

Interesting that it’s faster than C on your machine, it’s not on mine. I rewrote the C algorithm to be equivalent to the Fortran one, where the rho array is calculated outside of the loop, and with M=250 it runs in about 16 seconds, whereas @implicitall 's Fortran version takes over 30.

Also, I’m a bit embarassed for using abs instead of fabs in C I only used the original C code with fabs and different epsilon. Feel free to post the program changes and number of iterations.

1 Like

Another candidate for our GitHub - fortran-lang/benchmarks: Fortran benchmarks repository. If anyone is interested in taking the lead on it, please let me know.

1 Like