I have to use a lot of speed critical C based text search routines in my work. My Fortran code uses classic fortran strings (character(Len=xxx) :: string). I would like to rewrite the C routines in Fortran using strings but the algorithms in C all assume a Zero based C character array. Rewriting them in Fortan using strings is not an issue but painfully refactoring the algorithm to use a 1 based string is. The algorithms used are opaque and the code is uncommented. At the moment, I translate the C to Fortran using Zero based array of characters so I don’t need to refactor the algorithm but I. need to copy the string into a Zero based array of chars before calling them and it’s killing performance.(I use a wrapper to memmove to do that, it’s fast). I’d rather use Fortran Zero based strings everywhere and avoid the overhead and the refactoring. Any ideas on how to do it?
As an interim measure, you may contrive something similar to the following:
program cfarray
character(len=25) :: str = 'Able was I ere I saw Elba'
character :: s0(0:24)
integer ofs(2),i,j
!
s0 = transfer(str,s0) ! str is 1-base, s0 is 0-base
j = 0
do i=0,len(str)-1
if(s0(i).eq.'I')then ! string processing algorithm in 0-base
j = j+1
ofs(j) = i
endif
end do
print '(1x,A,2x,2i4)','Offsets with 0-base: ',ofs
j = 0
do i=1,len(str)
if(str(i:i).eq.'I')then ! string processing algorithm in 1-base
j = j+1
ofs(j) = i
endif
end do
print '(1x,A,2x,2i4)','Offsets with 1-base: ',ofs
end
If you do so, whenever possible, you could subsequently recast each 0-base block of code into the corresponding 1-base block of code, and you may plan to pass the code to others to maintain only after this recasting has been completed and no 0-base blocks of code remain.
there are things you can do with C pointers and also passing things in and out of C, but I personally do not like that approach,. MOVE_ALLOC and POINTERS do some interesting things with character variables of different length but nothing useful (bemusing, though). So hopefully some of the algorithms are of general use and could be converted and placed in stdlib or an fpm project? TRANSFER seems like a natural but it is really slow on some platforms. Oddly, I have found the fastest thing that is relatively portable is to treat the characters as integers and then copy them with a little loop using achar; for probably historic reasons several popular Fortran compilers have not optimized CHARACTER operations well at all.
Proposing new Fortran features might get a good discussion going about it. Should you be able to specify length with a lower bound and upper bound? Should MOVE_ALLOC allow moving a character array to a string as a special case? Ditto for a CHARACTER pointer? Should EQUIVALENCE be recognized as not being such a bad idea after all and something like it allowed for things like this? People are doing something close to that with C pointers, indicating there is a need. Features like this seem to be at the bottom of the stack as they are not traditional numeric computations, which get the most attention in Fortran.
@bwanakelly , is it possible for you to provide a mock-up of some representative case of a “critical C based text search routine”?
With such a case in mind, readers here can offer some ideas/tricks or alternate approaches to try out.
Chances are high the place you may start is with “thin” wrappers around the critical C routines themselves which are in C and which make use of the enhanced interoperability with C in the Fortran standard around ISO_Fortran_binding.h
introduced starting Fortran 2018 revision. See here and here for simple illustrations. Basically what the thin wrappers in C approach can allow you to do is abstract away the differences between zero-based array/pointer of char
s in C vs the Fortran design of CHARACTER
type.
My hunch is this will minimize any performance hits you take due to the “copy the string into a Zero based array of chars before calling them” by doing away with the need to copy altogether.
I don’t think the copy should be necessary.
character(*), parameter :: c = 'Hello World'
call sub(0,len(c)-1,c)
contains
subroutine sub(first,last,a)
integer, intent(in) :: first, last
character, intent(in) :: a(first:last)
write(*,'(2(i0,1x),*(a))') first, last, a(first:last)
end subroutine sub
end program
It is standard to associate a character string to a character array, and I think all fortran compilers will establish that argument association without copy-in/copy-out. This example uses an internal subprogram, but I think the same thing can be done now inline with an ASSOCIATE block.
[edit] I looked at the description of ASSOCIATE in MFE, and maybe this is not possible after all. ASSOCIATE blocks have most of the functionality of dummy argument association, but this feature of character strings and arrays seems to be not allowed. Anyone know for certain whether or not this is possible?
Thanks @RonShepard. Well, that is my lesson for the day. I was sure that was not standard but that you could get away with it only without an interface and usually compiling the routine in a separate file. A pleasant surprise I was wrong. That looks like a great solution to me. I took that out of some old routines quite a while ago when I moved them to modules I was so sure!
Jeez Ron… what a blockhead I’ve been. I have been attempting to do this using the integer :: a(0: ) and the compiler has blocked it. I just never thought about using integer :: a(first:last).
Thank you very much! I spent the morning refactoring some of my code and it works a treat!
This may be a separate issue to the string to array character association feature. A declaration like a(0:)
is an assumed shape declaration, and it allows only certain actual arguments to be associated with it. A declaration like a(0:last)
or a(first:last)
is an explicit shape declaration. For example, an assumed size actual array cannot associate with an assumed shape dummy array. However, a declaration like a(0:*)
is an assumed size declaration that can associate with an assumed size actual argument. That has the 0 lower bound, with an unspecified upper bound (and it is the programmers responsibility to make sure that works). BTW, the lower bound can be a constant or a variable in the declaration, so that also should not be an issue regarding argument association. Assumed shape dummy arguments need an explicit interface, while assumed size arguments do not, so that might also be an issue. Another feature is that assumed size and explicit shape arguments must be contiguous, and that requirement sometimes triggers copy-in/copy-out argument association, while assumed shape arguments need not be contiguous. In the string-to-array character association, the string is always contiguous, so that is why I said that fortran compilers will not do copy-in/copy-out in that case.
This all seems like it would be complicated to a new fortran programmer, but it does make sense in a historical context. Even experienced programmers can make these association mistakes. The compiler needs to know more information for assumed shape dummy arguments than for the other cases, so a different association mechanism is required.
Thank you again! Very instructive and much appreciated.