Surprise with asterisk in list-directed read

Today I learned about an odd feature of list-directed input.

Consider the following file (“strings.txt”)

1*5 aa  aaa
2*1 b   bb
3*2 ccc c

Let’s say I am only interested in the second and third columns. My usual approach is to have a dummy character variable, used as follows:

implicit none
character :: dummy
character(8) :: column1, column2
integer :: i
open (100, file='strings.txt')
do i = 1, 3
  read (100, *) dummy, column1, column2
  print *, column1, column2
end do
close (100)
end

My hope is for the following output:

aa     aaa
b      bb
ccc    c

But this is not what the program produces. Instead, I get

aa     aaa
1      b
2      2

How bizarre!

It turns out that the asterisk (and some other characters) have special meaning in this context. From this Intel documentation page I gather that the asterisk signifies a repeat-count. I don’t fully understand how it is meant to be used, but it was quite confusing! I wonder how widely known/used this feature of list-directed input is.

In my case, this behavior was not desired. As a workaround, I read each line into a long character variable and replaced each asterisk with a more benign character. Then I could do an internal list-directed read on the sanitized line.

Anyway, I just wanted to share this little adventure, and I am curious if anyone else has ever been surprised by this behavior or actually found use in it.

1 Like

I agree it’s surprising, but it is consistent with the first two examples of the overloaded asterisk in the code

      program test23stars
      character:: b,c,d='?'
      integer j,k,n(2)
      data j,k/2*0/                 ! repeated value in data            (3)
      call input(n,'2*4',b,c,d,*666)! repeated value in list-directed input, 
C                                   ! alternate return
...

at

Repeat counts apply to both list-directed i/o and to namelist i/o. It was originally a convenience for people who read input from punched cards. I think it is also allowed for the processor to use repeat counts on output, so it is self-consistent. On input, something like 5*, will skip over five entries, the same as five repeated commas.

I worked on a project that made substantial use of this feature for their custom format input files.

Edit to say: I don’t think it was a good idea, and people shouldn’t do this anymore. We have fairly standardized file formats for things these days (i.e. json, yaml, etc.).

1 Like

One of the things I remember from working on the help desk at Imperial College was the complexity of Fortran I/O and how easy it was for users to get things wrong. Lots of gotchas.

1 Like

I vaguely remember a CDC Cyber machine at Imperial, ca 1980, available to first-year undergraduates to practice FORTRAN on.

There are more odd behaviors of list-directed input, including treatment of undelimited character strings, null values, and / as a terminator. While list-directed input is sometimes very handy, if you understand all of what it does, it often trips up people.

It gets worse if the implementation is more liberal - for many years, DEC Fortran allowed free conversion between logical and numeric, so that a bare T in the input stream would convert to -1 - fortunately that is no longer the default in Intel Fortran.

The biggest mistake I see people make is to delegate error checking to list-directed input. I keep telling people that it is far more accepting than you probably want.

And then there’s all the complaints about list-directed output not formatting things the way people want.

Another situation where list-directed output does not suffice: a complex program that is being developed with a set of test problems, and the output files are to be generated with different compilers or different compiler options, for comparison with a reference output file set.

File comparison utilities rarely think that 3.14159274 and 3.14159 are the same (exceptions: ndiff and numdiff on Linux).

https://fortranwiki.org/fortran/show/getvals

describes and shows a use of list-directed input.