Errors using certain Fortran compilers in CRAN checks

Hello, I am developing a R package which calls certain fortran subroutines that I wrote myself. I have been experiencing a pesky problem for a while now that my fortran subroutines result in errors in certain CRAN checks that use different Fortran compilers (e.g., flang, intel). On my own computer I do not get any errors however. I work on a MacBook Pro with M1 chip (Ventura 13.5) using gfortran (GNU Fortran, Homebrew GCC 14.1.0). Because my code is quite long (2000 lines), it is very difficult to find the errors because I cannot debug directly (I work within R). Currently, I am trying to test every single step via rhub_check().

One error that I tend to get is

free(): invalid next size (fast)
Aborted (core dumped)

I already get this error when using the following (mega) simplified subroutine. Maybe somebody can help explaining why this would cause an error on certain Fortran compilers (but not with gfortran on my Mac)? And does anybody perhaps know of an overview that explains the differences between the various Fortran compilers?

subroutine estimate_bct_ordinal(samsize0,numG,P,CDrawsStore)

implicit none

integer, parameter :: i6 = selected_int_kind(6)
integer, parameter :: r15 = selected_real_kind(15)

integer(i6), intent(in) :: P, numG, samsize0
real(r15), intent(out)  :: CDrawsStore(samsize0,numG,P,P)
real(r15)               :: CCan(P,P)

CCan = 1
CDrawsStore(1,1,:,:) = CCan(:,:)

end subroutine estimate_bct_ordinal

I also get this error more often. So apparently I am doing something wrong with basic declarations which I don’t understand and which are not picked by my own fortran compiler.

Moreover, my subroutine also contains many existing functions, such as from John Burkardt, for getting draws from probability distributions. Maybe somebody has experience with the compatibility of these existing functions in different Fortran compilers available on CRAN?

Any help is greatly appreciated!

2 Likes

Welcome to Discourse!

How big is P? You might be exhausting the stack here. The variable Can is undeclared in your example; I’m guessing it is supposed to be CCan.

Are you sure the kinds i6 and r15 are compatible with your R types?

Hello, P is not big. For example, 3. Then I already get the error. samsize0 is generally bigger, but also not super big, say, around 5000. The choice of the use of these kinds i6 and r15 was based on recent recommendations that I found when using fortran subroutines for R packages on CRAN. I use these kinds in other subroutines without problems. Btw indeed there was a typo there as it should have been CCan (which I corrected).

According to StackOverflow this error is typical for a double-free, i.e. trying to free a null pointer. Somehow I doubt this error is related to this procedure.

As noted in the R docs, it’s important your types have the same storage width (and representation): Writing R Extensions

My own recommendation would be to just use the C kind specifiers matching whatever R uses and pack those into a module:

module rkinds
   use, intrinsic :: iso_c_binding
   private
   integer, parameter, public :: rint = c_int
   integer, parameter, public :: rdp = c_double
end module

subroutine estimate_bct_ordinal(samsize0,numG,P,CDrawsStore)
use rkinds, only: rint, rdp
implicit none
integer(rint), intent(in) :: P, numG, samsize0
real(rdp), intent(out)  :: CDrawsStore(samsize0,numG,P,P)
real(rdp)               :: CCan(P,P)

! ...
end subroutine estimate_bct_ordinal

On my Mac, and using gfortran, the values of rint and i6 are the same in practice, however the intent w.r.t the goal of interoperating with the canonical R implementation is different.

1 Like

Still I can’t understand what is happening. I do not get errors when running the subroutine below but when I uncomment the last two lines with the declarations for CCan and CDrawsStore then I do get an error. And when I comment all above and keep the uncommented declarations of CCan and CDrawsStore then I also don’t get an error. Again, all integers are relatively small.

module rkinds
   use, intrinsic :: iso_c_binding
   private
   integer, parameter, public :: rint = c_int
   integer, parameter, public :: rdp = c_double
end module


subroutine estimate_bct_ordinal(postZmean, postZcov, P, numcorr, K, numG, BHat, sdHat, CHat, XtXi, samsize0, &
    burnin, Ntot, Njs, Xgroups, Ygroups, C_quantiles, sigma_quantiles, B_quantiles, BDrawsStore, &
    sigmaDrawsStore, CDrawsStore, sdMH, ordinal_in, Cat_in, maxCat, gLiuSab, seed, nuggetscale)
!
    use rkinds, only: rint, rdp
!
    implicit none
!
!    integer, parameter :: r15 = selected_real_kind(15)
!    integer, parameter :: i6 = selected_int_kind(6)
!
    integer(rint), intent(in) ::P, numcorr, K, numG, samsize0, burnin, Ntot, maxCat, seed, Njs(numG)
    real(rdp), intent(in) ::  BHat(numG,K,P), sdHat(numG,P), CHat(numG,P,P), XtXi(numG,K,K), Cat_in(numG,P), &
                              sdMH(numG,P), Xgroups(numG,Ntot,K), Ygroups(numG,Ntot,P), ordinal_in(numG,P), &
                              nuggetscale
    real(rdp), intent(out)::  postZmean(numcorr,1), postZcov(numcorr,numcorr), B_quantiles(numG,K,P,3), &
                              C_quantiles(numG,P,P,3), sigma_quantiles(numG,P,3), BDrawsStore(samsize0,numG,K,P), &
                              sigmaDrawsStore(samsize0,numG,P), CDrawsStore(samsize0,numG,P,P), &
                              gLiuSab(samsize0,numG,P)
    real(rdp) ::  BDraws(numG,K,P), CDraws(numG,P,P), sigmaDraws(numG,P), meanMat(Ntot,P), SigmaMatDraw(P,P), &
                  R_MH, covBeta(K*P,K*P), Ds(P,P), Ccan(P,P), CcanInv(P,P), Ccurr(P,P), epsteps(P,P), &
                  SS1(P,P), SS1inv(P,P), rnunif(1), errorMatj(P,P), sigma_can(P), aa, bb, &
                  betaDrawj(1,P*K), acceptSigma(numG,P), dummyPP(P,P), dummyPPinv(P,P), &
                  varz1, varz2, varz1z2Plus, varz1z2Min, Cnugget(P,P), SigmaInv(P,P), sdMHg(numG,P), gLiuSab_can, &
                  Wgroups(numG,Ntot,P), alphaMin, alphaMax, Cinv(P,P), Bmean(K,P), acceptLS(numG,P), &
                  alphaMat(numG,maxCat+1,P), Wdummy(numG,P,Ntot,maxCat), condMean, condVar, &
                  Zcorr_sample(samsize0,numcorr), dummy3(samsize0), dummy2(samsize0), &
                  diffmat(Ntot,P), meanO(P*K), para((P*K)*((P*K)+3)/2 + 1), randraw, gLiuSab_curr(numG,P)
    integer(rint) ::s1, g1, i1, corrteller, Cat(numG,P), ordinal(numG,P), &
                  c1, c2, p1, Yi1Categorie, tellers(numG,maxCat,P), k1, p2, iseed, errorflag, &
                  lower_int, median_int, upper_int
!
    !part 1
!   set seed
    iseed = seed
!
    !initial posterior draws
    BDraws = BHat
    sigmaDraws = sdHat
    CDraws = CHat
    meanO = 0.0
    gLiuSab_curr = 1.0
!
    do g1=1,numG
        do p1=1,P
            ordinal(g1,p1) = int(ordinal_in(g1,p1))
            Cat(g1,p1) = int(Cat_in(g1,p1))
        end do
    end do
    do p1=1,P
        do g1=1,numG
            !initial values
            if(ordinal(g1,p1)==1) then
                sigmaDraws(g1,p1) = 1.0
                sigma_quantiles(g1,p1,1) = 1.0
                sigma_quantiles(g1,p1,2) = 1.0
                sigma_quantiles(g1,p1,3) = 1.0
            end if
        end do
    end do
!
    !define nugget matrix to avoid approximate nonpositive definite correlation matrices for candidates
    Cnugget = nuggetscale
    do p1=1,P
        Cnugget(p1,p1) = 1.0
    end do
!
    !count number of accepted draws for R (over all groups)
    acceptSigma = 0.0
    acceptLS = 0.0
    sdMHg = .1 !for gLiuBanhatti parameter
!
    !initial values for latent W's corresponding to ordinal DVs
    Wgroups = Ygroups
    Wdummy = 0.0
!
    !initial values of boundary values alpha to link between ordinal Y and continuous latent W
    alphaMat = 0.0
    alphaMat(:,1,:) = -1e10  !alpha0
    alphaMat(:,2,:) = 0.0      !alpha1
    do p1=1,P
        do g1=1,numG
            if(ordinal(g1,p1)>0) then
                do c1=3,Cat(g1,p1)
                    alphaMat(g1,c1,p1) = .3*(real(c1)-2.0)
                end do
                alphaMat(g1,Cat(g1,p1)+1,p1) = 1e10
            end if
        end do
    end do

    !part 2
    !Ccan = 1
    !CDrawsStore(1,1,:,:) = Ccan(:,:)

end subroutine estimate_bct_ordinal

intent(out) is used in the routine, but IIRC it requires all the elements of an array to be assigned explicitly, while some of the dummy arrays are not assigned. If they are to be modified partly, I guess it may be better to use intent(inout). (Also, the arrays are not automatically filled with zero, so if necessary, we need to set zero manually for intent(out) variables.)

Also, as Ivan mentioned, I’m also afraid about memory corruption or buffer overflow or something in some different places (or possibly in this routine)… I guess some of the check methods below might be useful to try (but because I’ve never used R, I am not sure which is most effective or straightforward in practice…)

Is CDrawsStore allocated by R and its size correct? (If not, I guess it may lead to buffer overflow).

Thanks again both for your input. Changing intent(out) to intent(inout) did not change anything. When the subroutine includes both parts (part 1 and part 2) I get an error but not when either one of these is commented. Thus, practically all objects that were intent(out) (which I now all changed to intent(inout)) were not explicitly assigned. But only a small part of CDrawsStore in part 2. Btw this assignment in part 2 is not in the actual subroutine but I just have it here as a test as this also seems to go wrong.

I will check the link to Writing R Extensions again but I also have to admit that I have difficulty linking the information that’s there to my problem.

I will also check the size of CDrawsStore but I think it is correct (I have checked it several times already…).

1 Like

When checking the Writing R Extensions document, as example is given about argument mismatches but to me it is unclear if these ways of writing a subroutine is a problem. What is the problem in these examples below? And should I also use the size (*) instead? Thanks again.

      subroutine sub_assumed(array, n)
        real array(*)
        integer n
        …
      end

      subroutine another_explicit(array, n)
        integer n
        real array(n)
        …
      end

Another potential issue might be (again) GC on the R side, and gctorture and rchk seem to be some tools for such issues (sorry if you already know this…). I believe some people on this forum is familiar with R, so hope they will chime in.

Forgetting to protect and/or to unprotect (protect bug) often makes R crash but also can lead to incorrect results. It is not uncommon that old protect bugs are uncovered much later by inconsequential code changes. These bugs are common and hard to find, and thus R offers tools to detect them. gctorture helps testing by increasing the chance a protect bug will crash R and will do so sooner after the code with the bug executes. rchk is a static analysis tool that identifies potential protect bugs in C code without executing it;

I have to admit that the protect/unprotect steps are completely new to me. I have never used them before. So it is difficult for me to understand where to start. Also the protect/unprotect lines seem to be done in C rather than Fortran. But then I have to add protect/unprotect lines then in fortran? Sorry for these (apparent) basic questions… I always thought that these memory allocations are straightforward in Fortran and are already done when declaring objects…

I am sorry to be confusing… Indeed, I have no idea whether GC is related to the Fortran case. Though such tools are mentioned in the above page, they may be related only to C/C++ etc. Also, I’ve never seen “protect” things explained in tutorials for calling Fortran from R, e.g., like the following:

So anyway, I initially imagined more about buffer overflow etc (and the resulting memory corruption of other irrelevant R objects) because the dummy arrays are explicit-shape (i.e. they receive size info as arguments) and are rather error-prone (e.g., passed size info could be wrong). But I am not sure what is really causing the trouble yet…

(Nevertheless, it may be useful to apply bounds check to the Fortran code, if possible.)

If necessary, it might be useful to ask StackOverflow also because some people there (e.g. francescalus) seem very familiar with R + Fortran (like in this page) I also guess Beliavsky is familiar with R + Fortran combination (cc: @Beliavsky)

Thanks again for these suggestions. After a lot of checking (unfortunately I cannot debug directly…), I found out that the problem was caused by the mismatch between the format of the reals and integers in R when calling the Fortran subroutines.

Luckily I have been able to get the full subroutine running (for now) on my Mac and on a Windows machine. I still get an error though when checking it on a Linux machine. I build the subroutine again in small steps and a problem seems to appear in the following matrix multiplication step:

meanMat(1:Njs(g1),1:P) = matmul(Xgroups(g1,1:Njs(g1),1:K), &
                BDraws(g1,1:K,1:P))

Here Njs(g1)=20, g1=1, K=1, P=3. Even though this does not give problems on a Mac and Windows machine, I could imagine that this may cause a problem because K is 1. Based on what is said here (Solved: about matmul() - Intel Community), the dimensionality would not be a problem as 1:K would remain to be an array. But I also checked it using the reshape function:

meanMat(1:Njs(g1),1:P) = matmul(reshape(Xgroups(g1,1:Njs(g1),1:K),(/Njs(g1),K/)), &
                reshape(BDraws(g1,1:K,1:P),(/K,P/)))

but I still get an error. Would there be other (less error prone) ways to do this matrix multiplication? Of course, it could also be that the error is caused somewhere else but for some reason this line kills the check (and then the provided information would be insufficient). Thanks again.

Please explain what you mean by “the mismatch between the format of the reals and integers in R when calling the Fortran subroutines”. I am a very infrequent user of R, and I do not know what R does “under-the-covers”, but I think that you may have overlooked the following point:

Integers and reals (of any precision) have different bit representations for mathematically identical integer numbers.

For example, 1.0 (single-precision real) has the IEEE-754 representation Z’3F800000’ whereas the integer 1 has the internal representation Z’00000001’.

Hello, to be honest I am not familiar enough with these representations to give you a clear answer to this. One of the problems my above code was also a silly error regarding the dimension of alphaMat, which in its second dimension was sometimes not larger than 1, and thus

alphaMat(:,2,:) = 0.0

was (also) causing an error. The problem for me is that it is really not easy to debug under these different compilers.

Catching such kinds of dimension errors at runtime is possible. You need to use the flags:

In the particular case of ifx v2024.0 use -check all,nouninit to bypass a compiler bug.

Could you show us the way the subroutine is being used from R? Are you using dyn.load and .Fortran() or something else?

Btw, the code you showed is mixing single- and double-precision floating point numbers:

  • floating point literals such as 0.0, 1.0, .3, 2.0, and 1e10 are single precision by default (there are compiler flags to change this, but you don’t want to go that way). To make sure literals have the desired precision you need to append the kind specifier, i.e. 0.0_rdp, 1.0_rdp and so on.
  • when using the conversion functions, specify the type, i.e. real(value,kind=rdp)

Thanks a lot for these suggestions. Any change you know how/where should I should add these statements (-fcheck=all etc.) in the R package? I expect this will be very useful for me because I also get warnings such as

❯ checking usage of KIND in Fortran files ... WARNING
  Found the following files with non-portable usage of KIND.
  For details set environment variable _R_CHECK_FORTRAN_KIND_DETAILS_ to
  a true value.

So with these checks I should be able to find the precise cause (because I do think that I use the kind types that are recommended.

About the other question, yes indeed I use dyn.load and .Fortran(). I did read somewhere that it was recommendable do it differently (using .call I recall), but preferably I don’t want to go there as this may result in new hurdles for me on this level of programming with which I am not super familiar. For completeness my package can be found here: https://github.com/jomulder/BFpack

And thanks for informing me about the addition of kind specifiers.

For example even with this subroutine,

module rkinds2
   use, intrinsic :: iso_c_binding
   private
   integer, parameter, public :: rint = c_int
   integer, parameter, public :: rdp = c_double
end module

subroutine estimate_bct_ordinal_test(P, numG, Ntot, Ygroups, Cnugget)

    use rkinds2, only: rint, rdp

    implicit none

    integer(rint), intent(in)    :: P, numG, Ntot
    real(rdp), intent(in)    :: Ygroups(numG,Ntot,P)
    real(rdp), intent(out)   :: Cnugget(P,P)
    real(rdp)                :: nuggetscale
    integer(rint)                :: p1

    nuggetscale = 0.995_rdp
    Cnugget = nuggetscale
    do p1=1,P
        Cnugget(p1,p1) = 1.0_rdp
    end do

end subroutine estimate_bct_ordinal_test

I get the warning

❯ checking usage of KIND in Fortran files ... WARNING
  Found the following files with non-portable usage of KIND.
  For details set environment variable _R_CHECK_FORTRAN_KIND_DETAILS_ to
  a true value.

Which I would assume should be fixed now. The fortran compiler that gives me this warning is

ifx (IFX) 2023.2.0 20230721

Thanks again for your input.

I read that I can do this via a Makevars file. Thanks for your help.

A related question to this topic, on an intel compiler I get the following warning

❯ checking usage of KIND in Fortran files ... WARNING
  Found the following files with non-portable usage of KIND:
    test_kind__genmod.f90
  For details set environment variable _R_CHECK_FORTRAN_KIND_DETAILS_ to a true value.

already for a practically empty subroutine in test_kind.f90:

module rkinds
   use, intrinsic :: iso_c_binding !c_int c_double
   private
   integer, parameter, public :: rint = c_int
   integer, parameter, public :: rdp = c_double
end module

subroutine test_kind(X, Y)

    use rkinds, only: rint, rdp

    implicit none

    real(rdp), intent(in) ::  X
    real(rdp), intent(out) ::  Y

    Y = X + 1.0_rdp

end subroutine test_kind

Does anybody have an idea that could explain this warning? I only seem to get it on an intel computer with ifx (IFX) 2023.2.0 20230721. Thanks in advance!

Btw here is more info regarding the compilation: https://github.com/jomulder/BFpack/actions/runs/9454906510/job/26043521820