Format statements, Generic Containers, and Allocatable Arrays

Moved this from the Advent of Code post, as I felt it didn’t really belong there, and should be it’s own post.

I finally finished Day 5 of Advent of Code.

AoC Day 5

I have a few questions and comments that came out of the exercise.

Maybe it’s not very ‘Fortran-y’, (and definitely a lot more work than @sblionel 's elegant solution) but I created a ‘CharArray’ with push(), back(), and pop() methods to act as my stacks. It’s annoying to think that if I want to do it for a different type, I have to copy-paste everything. Hopefully generics will solve that particular issue. (I know other non-standard solutions exist, but they are still non-standard)

My implementation of pop() just decrements the size of the CharArray(). I can do this because I’m just dealing with characters. My real question is, how would I call the finalizer for the element I just popped if the element was a derived type that managed a resource? (I’m asking as if this were a generic container)

I tried to use a read statement for the io, but could not figure out the correct format string, so I implemented my own ‘split’ function instead. Are there any resources for how to write format statements? I found some that list what each character means, but I had a hard time piecing it together into a complete whole. To be honest I thought it was like a regex/pattern matching kind of thing, so I thought I could do: read(line,'("move" I "from" I "to" I)'). Yeah, that didn’t work. I’ve only used list directed input before.

I also found an excuse to use an associate statement. It worked, but I don’t understand what it’s doing. My feeling is that it is some sort of ‘scoped macro’. Is this correct?

Elemental functions are awesome. We should do more of those. I love that I could print my final answer by doing stacks%back(), and I was able to solve reading the “move 1 from 2 to 3” lines by splitting the line and then doing words(2::2)%to_i64()

I was able to do part 2 by modifying back(), pop(), and push() to deal with entire arrays and creating a generic type-bound procedure. It took me a long time to find the info on how to do it. I found it rather … unwieldy … to implement. But I was quite happy with the final result.

But it brought up another question for me: What actually happens when you return an allocatable array from a function and assign it to another allocatable array? As in a = return_allocatable_array(). I know with the automatic resizing, a will end up with the correct size and values. But does the temporary returned from the function get moved into a or does it get copied into a? If it’s copied, can we use move_alloc() to avoid the copy?

Apparently we cannot. The result of such a function is not treated (at least by the compilers I have) as an (allocatable) object. The code:

program m
  implicit none
  real, allocatable :: a(:)

  call move_alloc(allfun(10), a)
  a = allfun(5)
  print *, a
contains
  function allfun(n)
    real, allocatable :: allfun(:)
    integer, intent(in) :: n
    allocate(allfun(n))
    call random_number(allfun)
  end function allfun
end program m

does not compile unless call move_alloc() line is commented out.

$ ifort allocfun.f90
allocfun.f90(5): error #8195: The argument to the MOVE_ALLOC intrinsic subroutine shall be an allocatable object.   [MOVE_ALLOC]
  call move_alloc(allfun(10), a)
------------------^
compilation aborted for allocfun.f90 (code 1)

$ gfortran allocfun.f90
allocfun.f90:5:18:
    5 |   call move_alloc(allfun(10), a)
      |                  1
Error: 'from' argument of 'move_alloc' intrinsic at (1) must be a variable

The only somewhat relevant information in the Standard (9.7.3.2.4) is that the result of such a function is automatically deallocated after the statement containing a reference to the function is executed.

To my surprise, there seem to be very few resources online and those existing mainly (or exclusively) regard the output. E.g. Fortran Wiki.This may reflect the typical situation in which one reads using list-directed I/O and writes using format description to get pretty printing. The former way work fine if the input tokens are separated by white space and/or commas, with possible exception being strings with embedded spaces.
But for your case of the input given for AoC 2022 day 5, it can be easily read using list-directed I/O.

integer :: item, from, to
character(10) :: dum1, dum2, dum3
character(80) :: line
! ...
read(line,*) dum1, item, dum2, from, dum3, to

You got pretty close. Here is a working example of what you are trying to do:

character(30) :: line
write(line,'("move ",I0," from ",I0," to ",I0)') 11,21,31
write(*,'(a)') line
end 

Note that you need to “write” not “read” to the line character string. The integer fields need to be separated by commas. There are several ways to get the spacing right in the output line, the way I did it above is just one way.

You will find that fortran i/o formats are somehow both surprisingly powerful and frustratingly limited.

My understanding of the OP’s question is that a whole line of text is read into line variable and then OP tries to parse it. Thus, read(line,...), not write(line,...)

I have just recalled the PGI Fortran Reference Guide Version 2018 - a thorough document by NVidia. It contains Chapter 5 - Input/Output, with detailed description of all format descriptors, both on output and input.

1 Like

You are correct. Not being very experienced with format statements, I had some bad assumptions. My first bad assumption was that reads and writes were symmetric: The same format statement that writes a line can then be used to read the same line back in. It’s frustrating that this doesn’t seem to be the case.

Thank you for the PGI manual! It is excellent!

About the I0 - does 0 width mean, “make it as big as it needs to be”? Does that apply to reading as well?

So I guess the answer is: The compiler could avoid the copy if it was smart enough. The fortran standard says nothing about how it is actually implemented.

Yes, I0 means to write out the minimal field width to hold the digits and sign. It only works for output (write). Many programmers think that convention should be extended also to input (read), but that has not yet been done. That is one of the frustrating things about fortran i/o formats, there are many small details like this that would make life so much easier, but they just don’t work that way.

Regarding the symmetry, yes you can use a format statement that works symmetrically for both input and output in this case. It is a matter of taste and programming style whether that is the best way to do it. You can also use a format for the write that can be read on input by a list-directed read. In this situation, that is probably what I would do. The other option is to read the line and parse out the different fields. That is not complicated, but it is something that you need to do yourself, fortran does not do that for you automatically (except for the list-directed case).

The Fortran Standards don’t allow a zero field width e.g. I0 in a format used for input. In the Fortran2003 Standard this is in section 10.6.1 (6). I doubt if this restriction has been lifted in later standards as it seems hard to work out what it would mean.

There are several other things you can put in formats used for output that are not allowed in input, e.g. literal character strings “some text”.

Instead of editing my previous post, here is a followup that shows one way to do this.

$ cat fmt.f90
integer :: i, j, k
character(30) :: line, tmp
write(line,'("move ",I0," from ",I0," to ",I0)') 11, 21, 31
write(*,'(a)') line
i=-1; j=-1; k=-1
read(line,*) tmp, i, tmp, j, tmp, k
write(*,*) 'i,j,k=', i, j, k
end
$ nagfor fmt.f90 && a.out
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7114
Questionable: fmt.f90, line 8: Variable TMP set but never referenced
[NAG Fortran Compiler normal termination, 1 warning]
move 11 from 21 to 31         
 i,j,k= 11 21 31

I’m using tmp here just to throw away the character strings that are in line, and the compiler sees that I’m setting the value in the read statement but then never referencing it. Also, it might seem odd that I’m using the same tmp to hold all three strings. What is happening there is that the later occurrences are overwriting the first occurrences. If I wanted to keep those strings, I would have use separate tmp1, tmp2, and tmp3 variables. I know that the character strings in line are separated from the integers by spaces, and I know that they don’t have embedded quotes or slashes that would otherwise throw off the list directed read. So when it works, as it does here, the list directed read is an easy thing to do.

This is not hard at all. Most other languages already do this, and have done so for decades. There are a few decisions that would need to be made for portability to ensure that all compilers act the same way, but this is low hanging fruit.

Thanks for the example! I mistakenly thought a character(10) variable always read 10 characters and wouldn’t work in list-directed reads.

Several string libraries have procedures that let you read from lines like that. In particular
The “pattern” function in GitHub - dpettas/fString lets you extract strings with
statements like:

Name = MyStr%Pattern(“My First Name is {} and my last name is {}.”)

And many of them have SPLIT() functions, ISNUMBER() functions and others that are
useful, where you generally read the line in as a character variable and then call them
for parsing. In the case shown, reading the line, eliminating non-digit characters and then
reading it with list-directed input would also work:

program nolets
character(len=80)           :: line
character(len=len(line))    :: line2
integer                     :: move, from, to
!read(*,'(a)') line
line=' move 3 from 3 to 7 '
write(line2,'(*(a))')(merge(line(i:i),' ',index('0123456789 ',line(i:i)) /= 0),i=1,len_trim(line))
read(line2,*)move,from,to
write(*,*)move,from,to
end program nolets

as just samples of several other approaches. By the way, although some compilers allow it, you cannot write back into “line” per the standard, which states the variable being written into cannot appear in the argument list of the WRITE, hence the use of LINE2 instead of just writing back into LINE2, but you could do it with an assignment statement instead. For example:

program nolets
character(len=80)           :: line
integer                     :: move, from, to
!read(*,'(a)') line
line=' move 3 from 3 to 7 '
do i=1,len_trim(line)
   line(i:i)=merge(line(i:i),' ',index('0123456789 ',line(i:i)) /= 0)
enddo
read(line,*)move,from,to
write(*,*)move,from,to
end program nolets

I suppose one could define this as behaving like list-directed input of an integer, but then, list-directed input exists.

List directed input can be used in a couple of situations. One like above is when it is known that the text strings do not contain quotes, slashes, or commas which would otherwise cause problems with the list-directed read. In this case, the text items do not need to be quoted, and embedded quotes don’t need to be doubled up, and list directed input can be used for the entire record.

The other is when the programmer can parse out the fields within the longer input record and then use internal list-directed reads on the individual fields: integer, real, or whatever. This requires, of course, more effort on the programmer’s part, but it also gives much more flexibility to the programmer.

What is “missing” in this context is the ability to mix formatted and list-directed-type fields within a single i/o statement. You can write the records this way, using i0, f0, e0, and g0 type fields, but the programmer cannot turn around and use those same fields in read statements. Most other popular languages do allow this functionality. They work typically by reading as many characters as possible to fill the field, then stop when they get to a character that isn’t in the set, and they leave the input buffer pointer positioned at that character. For integers for example, they would read the optional sign followed by as many integer digits as possible, and leave the input pointer positioned within the record at the first nondigit character, to then be processed by the next field in the format string (or, with nonadvancing i/o, for the next read statement).

There are a few options and conventions that need to be specified for portability reasons, so that all fortran processors do the same thing with the same input. One that comes to mind is whether a string like 123+4 is treated as a legitimate floating point number, or two integers, or as an error.

An interesting idea that I have not seen suggested before. You may want to elaborate on this at GitHub - j3-fortran/fortran_proposals: Proposals for the Fortran Standard Committee The particular ambiguity you mention would be resolved by the format chosen (I reads only integers,D/E/F only real or complex.) G format could read all intrinsic types based on the type of the corresponding input list item. (I know some compilers are more forgiving.) Without thinking too much about it, this feature could be limited to G0 (or G0.d).

So ifort should not allow null fields, as in

$ ifort xx.f90 -stand f18 -warn all
$ ./a.out
          10   33000.00               0           0

when you compile

program testit
character(len=:),allocatable :: str
character(len=256) :: iomsg
integer :: i,iostat,j,k
real :: r
   i=-999
   j=i
   k=i
   str='10 3.3+4                 '
   read(str,'(g0,g0,i3,f6.2)',iostat=iostat,iomsg=iomsg)j,k,i,r
  if(iostat.eq.0)then
      write(*,*)i,r,j,k
   else
      write(*,*)trim(iomsg)
   endif
end program testit

As it appears to literally treat g0 as a null field. One of the compilers (I thought it was ifort) treats g0 essentially as described. Each time it is encountered the next field is read as if with list-directed input and consumes the next field plus any remaining blanks up the the next non-blank field and then resumes the format, as I had to change it when porting to another compiler within the last few years. Really thought it was ifort, but apparently not. But stumbled on this odd bug/behavior instead.

There was a proposal to allow mixing list-directed and formatted but I do not think it got very far;
but it would have allowed things like ‘(10x,f5.3,*,f6.4,*)’ where the asterisk behaved a lot like g0.
And I think a plain X descriptor was proposed to mean any amount of white space, and some others discussed a long time ago with Lew Ondis III, who was on the Fortran committee; but I cannot recall if any proposals actually came out of that or not. I will try a few other compilers and see if any of them already has that as an extension or if it was something recognized as non-standard and removed.

PS:

Pretty minor, but as it is missing reporting a non-standard usage even when -stand is used I reported it on the Intel forum for comment …

The outlier that comes to mind for me is whether g0 on input of a complex value would require the input to be in the form (x,iy); or as x,x. Currently, listed-directed output of a complex writes “(X,Y)” but g0 on output generates “X,Y”; I believe.