C Interoperability: command-line arguments

Fortran 2003 added the intrinsic functions:

  • command_argument_count - Get number of command line arguments
  • get_command - Get the entire command line
  • get_command_argument - Get command line arguments

which can be used to parse the command line arguments.

C libraries often provide initialization routines, which accept the argc, argv pair. Here are some examples:

When calling such a routine from Fortran, one would have to allocate his own ragged array of C-strings, retrieve and copy the values using get_command_argument, and then pass the ragged array to whatever C library initialization routine needed.

In practice however, it seems that Fortran compilers also reuse the C mechanism behind the scenes. For example the Intel Fortran runtime has a for_rtl_init_(int *, char **) routine, which gets called from a C main, that then calls the Fortran MAIN_ routine.

Also with gfortran, a Fortran pseudo-main is created (MAIN_) is created, which is then called from a C main that also calls a runtime routine set_args(int, char **) from libgfortran.

Is there a way to retrieve a pointer to the array of character variables (the argv in the wrapper C main program)? Alternatively, is it possible to replace the compiler provided main and inject additional library initialization routines externally of the Fortran main program?

3 Likes

The output of godbolt (without optimization), can be helpful to see what happens behind the scenes: Compiler Explorer. The ifort or ifx compiler-supplied main remains hidden in godbolt, but you can still find it using command-line tools.

~/fortran/cmd$ cat main.f90 
program main
! nothing here
end program
~/fortran/cmd$ ifort main.f90 
~/fortran/cmd$ nm -C a.out | grep -i "main"
                 U __libc_start_main@@GLIBC_2.2.5
0000000000403830 T main
0000000000403880 T MAIN__
~/fortran/cmd$ gfortran main.f90 
~/fortran/cmd$ nm -C a.out | grep -i "main"
                 U __libc_start_main@@GLIBC_2.2.5
0000000000001150 T main
0000000000001149 t MAIN__

Excellent questions. Technically it seems this would not be difficult to do. What would be the best interface from the user perspective?

I understand the issue is tied to how platforms implement command-line arguments. It’s just frustrating that parsing part of the arguments in C, and part in Fortran, requires some overhead.

module iso_c_binding
integer(c_int), protected :: argc
type(c_ptr), protected :: argv
end module

Maybe these two variables could be part of the global state in the iso_c_binding module? One caveat in this case is const correctness.

Here is how we do it in LFortran: lfortran/lfortran_intrinsics.c at 4c3e8ae40213210048d351942e969e1400e6f1e3 · lfortran/lfortran · GitHub, we have these in the runtime. The libc library passes these into the main() function, so we extract them and set them in our runtime. Then we expose them to the user (different way in LFortran and LPython).

So in Fortran, we indeed could have a module like you described.

However, it seems it would be more natural to just have an array of strings rather than type(c_ptr) :: argv. But then you would need to call some function to convert it to a C pointer, and then issues with freeing the memory. So your proposed way avoids these issues. I just don’t know if we rather want to keep a high level interface in Fortran.

The issue comes to play also if you are embedding a Fortran program in C. Without proper runtime initialization, the Fortran intrinsics don’t work.

! print_args.f90
subroutine print_args() bind(c)
implicit none
integer :: arg, argc, l
character(len=128) :: str
argc = command_argument_count()
do arg = 1, argc
   call get_command_argument(arg,str,l)
   print *, str(1:l)
end do
end subroutine
// main.c

#if defined(GFORTRAN)
// Routines from libgfortran
void _gfortran_set_args(int, char **);
// Wrappers
#define fc_init(argc,argv) \
    _gfortran_set_args(argc,argv)
#define fc_finish()

#elif defined(IFORT)
// Routines from libifcore
void for_rtl_init_ (int *, char **);
int for_rtl_finish_ ( );
// Wrappers
#define fc_init(argc,argv) \
    for_rtl_init_(&argc,argv)
#define fc_finish() \
    for_rtl_finish_()
#endif

// Fortran routine that calls intrinsic routines:
//    command_argument_count(), and
//    get_command_argument()
extern void print_args();

int main(int argc, char *argv[])
{
    // Initialize Fortran runtime environment
    fc_init(argc,argv);

    // Call Fortran routine
    print_args();
    
    // Finalize environment
    fc_finish();
    
    return 0;
}
~/fortran/cmd$ make
gfortran -Wall -c print_args.f90
gcc -Wall -o main-gfortran main.c print_args.o -DGFORTRAN -lgfortran
ifort -warn all -c print_args.f90
icx -Wall -o main-ifort main.c print_args.o -DIFORT -lifcore
./main-ifort a b c
 a
 b
 c
./main-gfortran a b c d
 a
 b
 c
 d

If the line calling fc_init() is commented out, no output occurs.

1 Like

Here is a module defining a derived type that essentially makes a copy of the command-line arguments:

! c_args.f90
module c_args

   use, intrinsic :: iso_c_binding

   implicit none
   private

   public :: args
   public :: destroy_args
   public :: print_args

   !> C-compatible command line arguments
   type :: args
      integer(c_int) :: argc
      type(c_ptr) :: argv
   contains

   end type

   ! A few routines from libc are needed
   interface
      function c_malloc(size) bind(c,name="malloc")
         import c_size_t, c_ptr
         integer(c_size_t), value :: size
         type(c_ptr) :: c_malloc
      end function
      subroutine c_free(ptr) bind(c,name="free")
         import c_ptr
         type(c_ptr), value :: ptr
      end subroutine
      !> size_t strlen ( const char * str )
      function c_strlen(str) bind(c,name="strlen")
         import c_size_t, c_ptr
         type(c_ptr), value :: str
         integer(c_size_t) :: c_strlen
      end function
   end interface

   !> Overloaded structure-constructor
   interface args
      module procedure new_args
   end interface

contains

   !> Initialize the derived type with a copy of the command-line arguments
   function new_args() result(this)
      type(args) :: this

      type(c_ptr), pointer :: argv(:)
      integer :: i, l

      this%argc = command_argument_count()
      this%argv = c_malloc( this%argc * c_sizeof(this%argv) )
      call c_f_pointer(this%argv, argv, [this%argc])
      do i = 1, this%argc
         call get_command_argument(i,length=l)
         argv(i) = c_malloc( (l+1) * c_sizeof(c_null_char) )
         block
            character(len=l+1,kind=c_char), pointer :: str
            call c_f_pointer(argv(i), str)
            call get_command_argument(i,value=str)
            str(l+1:l+1) = c_null_char
         end block
      end do
   end function

   subroutine destroy_args(this)
      type(args), intent(inout) :: this
      integer :: i
      type(c_ptr), pointer :: argv(:)
      call c_f_pointer(this%argv, argv, [this%argc])
      do i = 1, this%argc
         call c_free(argv(i))
      end do 
      call c_free(this%argv)
   end subroutine

   subroutine print_args(this)
      type(args), intent(in) :: this
      
      integer :: i
      integer(c_size_t) :: l     
      type(c_ptr), pointer :: argv(:)
      
      call c_f_pointer(this%argv, argv, [this%argc])
      write(*,'(A)') "Printing from Fortran"
      do i = 1, this%argc
         l = c_strlen(argv(i))
         block
            character(len=l,kind=c_char), pointer :: str
            call c_f_pointer(argv(i), str)
            write(*,'(A)') str
         end block
      end do
   end subroutine

end module

Using the derived type, we can easily pass the arguments to C:

! main_args.f90
program main_args

   use c_args
   implicit none

   interface 
      !> void print_from_c(int *argc, char **argv);
      subroutine print_from_c(argc,argv) bind(c)
         use, intrinsic :: iso_c_binding, only: c_int, c_ptr
         integer(c_int) :: argc
         type(c_ptr), value :: argv
      end subroutine
   end interface

   type(args) :: myargs

   myargs = args()

   call print_from_c(myargs%argc,myargs%argv)  ! C 
   call print_args(myargs)                     ! Fortran

   call destroy_args(myargs) ! Don't forget, otherwise memory leak

end program
// print_from_c.c
#include <stdio.h>
void print_from_c(int *argc, char **argv)
{
    puts("Printing from C");
    for (int i = 0; i < *argc; ++i) {
        puts(argv[i]);
    }
}
~/fortran/cmd$ make main_args 
gfortran -Wall -c c_args.f90
gcc -Wall -c print_from_c.c
gfortran -Wall -o main_args main_args.f90 c_args.o print_from_c.o
~/fortran/cmd$ ./main_args a b c
Printing from C
a
b
c
Printing from Fortran
a
b
c

This should in principle work even with an underlying command-line mechanism that is not based on a C runtime library. In a parallel universe where a Fortran-based OS appears, we might have:

program main ! The Fortran "main", wrapping a C "main"
use c_args
implicit none
interface
   !> int main(int argc, char **argv)
   function main__(argc, argv) bind(c,name="MAIN__")  ! The C Main
      use, intrinsic :: iso_c_binding, only: c_int, c_ptr
      integer(c_int), value :: argc
      type(c_ptr), value :: argv
      integer(c_int) :: main
   end function
end interface
type(args) :: fargs
integer(c_int) :: ret

fargs = args()
ret = main__(fargs%argc,fargs%argv)
stop ret, quiet=.true.

end program

Anyways, I suppose the ugly work of writing the wrapper is better than the alternative where N libraries XYZ would need to implement a XYZ_Init() routine using Fortran intrinsics under the hood, and not the int argc, char **argv mechanism. The former is an overhead of O(1), the latter of O(N).

1 Like

I would call type(args) as type(c_args), consistent with c_ptr.

Why not putting this to stdlib? That seems like a reasonable interface that works today.

1 Like

I thought about stdlib too after I wrote the module (I feared it would be more work initially). Is there a chance to make J3 proposal out of this?

Also the vendor agnostic C macros for Fortran run-time initialization could go to stdlib.

1 Like

I have always assumed that these intrinsics are just interfaces to the POSIX (or other operating system) command line argument structure, which of course is defined with a C interface but need not be actually written in C. That is, I would not have thought that the actual command line arguments would have been copied or otherwise manipulated by the fortran runtime library, and that it is accessing the same data that would be accessed from a C or C++ program (or any other program running on that OS). Of course, once a command line argument has been returned through the fortran intrinsic, that is a separate copy with whatever fortran metadata there is to define it as a fortran character string, and that local copy can be modified without affecting the original OS structure.

I assumed that too, but looking at the symbol tables on Linux and Mac, I noticed that all three compilers (gfortran, ifort, and nvfortran) have this thing with MAIN_ and main going on. A colleague remarked that this must be some aboriginal Unix/ELF convention. The gfortran documentation for “Non-Fortran Main Program’s” states this is for historical reasons:

When you compile a PROGRAM with GNU Fortran, a function with the name main (in the symbol table of the object file) is generated, which initializes the libgfortran library and then calls the actual program which uses the name MAIN__, for historic reasons. If you link GNU Fortran compiled procedures to, e.g., a C or C++ program or to a Fortran program compiled by a different compiler, the libgfortran library is not initialized and thus a few intrinsic procedures do not work properly, e.g. those for obtaining the command-line arguments.


In libgfortran/runtime/main.c you can inspect that the runtime simply makes a copy of the input arguments from the C main:

static int argc_save;
static char **argv_save;


/* Set the saved values of the command line arguments.  */

void
set_args (int argc, char **argv)
{
  argc_save = argc;
  argv_save = argv;
}
iexport(set_args);

For nvfortran I haven’t yet found a way to initialize the runtime. The compiler SDK library folder appears to provide a f90main.o object:

$ nm -C /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/lib/f90main.o
                 U _GLOBAL_OFFSET_TABLE_
                 U __io_environ
                 U __io_set_argc
                 U __io_set_argv
0000000000000000 T main
                 U MAIN_
                 U __nv_init_env
                 U pgf90_exit

Many of these symbols can then be found in the files libnvc.ipl and libnvf.ipl:

$ grep -r "set_argc" /opt/nvidia/hpc_sdk/
Binary file /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/lib/libnvc.a matches
Binary file /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/lib/f90main.o matches
/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/lib/libnvc.ipl:FUNC . __pgio_set_argc
Binary file /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/lib/libnvc.so matches

Unfortunately, I can’t find any corresponding headers to tell me exactly which arguments they expect. Here is a little demonstration of which symbols get pulled in to a minimal working program:

:~/fortran/cmd$ cat mwe.f90 
! mwe.f90
program main
print *, command_argument_count()
end program
~/fortran/cmd$ nvfortran -o mwe mwe.f90 
~/fortran/cmd$ nm -C mwe
                 U __abort_init
0000000000404088 B __bss_start
0000000000402014 r .C283_MAIN_
0000000000402008 r .C298_MAIN_
0000000000402010 r .C300_MAIN_
0000000000402004 r .C301_MAIN_
0000000000404088 b completed.0
                 U __ctrl_init
0000000000404078 D __data_start
0000000000404078 W data_start
                 U __daz
0000000000401140 t deregister_tm_clones
0000000000401130 T _dl_relocate_static_pie
00000000004011b0 t __do_global_dtors_aux
0000000000403d68 d __do_global_dtors_aux_fini_array_entry
0000000000404080 D __dso_handle
0000000000403d70 d _DYNAMIC
0000000000404088 D _edata
0000000000404090 B _end
0000000000401368 T _fini
                 U __flushz
00000000004011e0 t frame_dummy
0000000000403d40 d __frame_dummy_init_array_entry
0000000000402164 r __FRAME_END__
0000000000404000 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
0000000000402018 r __GNU_EH_FRAME_HDR
0000000000401000 t _init
0000000000403d68 d __init_array_end
0000000000403d40 d __init_array_start
                 U __io_environ
                 U __io_set_argc
                 U __io_set_argv
0000000000402000 R _IO_stdin_used
0000000000401360 T __libc_csu_fini
00000000004012f0 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
0000000000401200 T main
0000000000401280 T MAIN_
                 U __nv_init_env
                 U pgf90_cmd_arg_cnt
                 U pgf90_exit
                 U pgf90io_ldw_end
                 U pgf90io_print_init
                 U pgf90io_sc_i_ldw
                 U pgf90io_src_info03a
                 U pghpf_init
0000000000404088 A __pgi_bss_section_start
0000000000404078 A __pgi_data_section_start
0000000000402000 A __pgi_rodata_section_start
0000000000401170 t register_tm_clones
0000000000401100 T _start
0000000000404088 D __TMC_END__

Update: through testing I’ve determined the following appears to work:

// WARNING: declarations may be wrong! Proceed at your own risk...
void __io_environ(void);
void __io_set_argc(int );
void __io_set_argv(char **);
void pgf90_exit(void);

// Fortran routine that calls intrinsic routines:
//    command_argument_count(), and
//    get_command_argument()
extern void print_args();

int main(int argc, char *argv[])
{
    // Initialize Fortran runtime environment
    __io_set_argc(argc);
    __io_set_argv(argv);

    // Call Fortran routine
    print_args();
    
    // Finalize environment
    pgf90_exit();

    return 0;
}
~/fortran/cmd$ make main-nvfortran
nvfortran -c print_args.f90
nvc -fortranlibs -o main-nvfortran main.c print_args.o
main.c:
~/fortran/cmd$ ./main-nvfortran a b c
 a
 b
 c

I have no clue what the other two routines ( __io_environ, __nv_init_env) do; the names suggest something to do with environment variables, and with the runtime enviroment (errors, exception handling, math tables, etc.).