Stdout buffer in gfortran

Does gfortran have c or cpp like automatic stdout buffer? If yes how to enable it?

I was playing around with strace and found that gfortran doesn’t seem to buffer anything. Single character write is directly called for each character if fortran program calls write(*,*) character by character.

write(1, "W", 1)                        = 1
write(1, "X", 1)                        = 1
write(1, "Y", 1)                        = 1
write(1, "Z", 1)                        = 1
write(1, " \n", 2)                      = 2
write(1, "1", 1)                        = 1
write(1, "2", 1)                        = 1
write(1, "3", 1)                        = 1
write(1, "4", 1)                        = 1
write(1, "M", 1)                        = 1
write(1, "N", 1)                        = 1

This makes printing something to terminal significatly slower (even the terminal starts using too much CPU)

image

VScode integrated terminal is consuming more CPU than the program itself. ( same with terminal emulators written in compiled languages).

Is there some way to enable it or is this absent in gfortran?


example code :

program test
    implicit none
    INTEGER::i

    do while(.true.)
        do i=49,90
            WRITE(*,"(A)",advance="no")achar(i)
        enddo
        print*,""
    enddo
end program test

C equivalent in this reply

There’s more no specific info. Just that binary from gfrotran foo.f90 doesn’t seem to buffer stdout at all.

Whenever write or print statements are used inside the program, the binary immediately calls the write syscall

Which differs from the behavior of other languages (c, cpp,python etc.) which buffer the output and flush it when the buffer fills up or characters like \n are printed by the program.

Question is why gfortran doesn’t do this by default and if there’s a possibility to enable buffering.

Can you post your Fortran test code? Do other Fortran or C compilers do better?

You can then have a look at gfortran’s internals and see how the print is implemented, if it calls into libc (which does buffering I think), or calls the Linux syscall directly, etc. Then we can go from there.

Yes, let me provide the simplified test cases.

C:

void main(){
    while (1)
    {
        for(int i=49;i<91;i++){putchar(i);}
        puts("");
    }
    
}

Fortran:

program test
    implicit none
    INTEGER::i

    do while(.true.)
        do i=49,90
            WRITE(*,"(A)",advance="no")achar(i)
        enddo
        print*,""
    enddo
end program test

Both compiled with gcc and gfortran respectively without any options. using strace ./a.out >/dev/null on both binaries:

Syscalls by C program:

write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4096) = 4096
write(1, "<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\n"..., 4096) = 4096
write(1, "GHIJKLMNOPQRSTUVWXYZ\n123456789:;"..., 4096) = 4096
write(1, "RSTUVWXYZ\n123456789:;<=>?@ABCDEF"..., 4096) = 4096
write(1, "23456789:;<=>?@ABCDEFGHIJKLMNOPQ"..., 4096) = 4096
write(1, "=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\n1"..., 4096) = 4096
write(1, "HIJKLMNOPQRSTUVWXYZ\n123456789:;<"..., 4096) = 4096
write(1, "STUVWXYZ\n123456789:;<=>?@ABCDEFG"..., 4096) = 4096
write(1, "3456789:;<=>?@ABCDEFGHIJKLMNOPQR"..., 4096) = 4096

Writing 4096 bytes at a time per syscall even though putchar (it can be replaced by printf("%c",i) without changing results) function is printing character by character.

But same thing on fortran is written character by character gfortran generated binary,

Syscalls by Fortran program:

write(1, "?", 1)                        = 1
write(1, "@", 1)                        = 1
write(1, "A", 1)                        = 1
write(1, "B", 1)                        = 1
write(1, "C", 1)                        = 1
write(1, "D", 1)                        = 1
write(1, "E", 1)                        = 1
write(1, "F", 1)                        = 1
write(1, "G", 1)                        = 1
write(1, "H", 1)                        = 1
write(1, "I", 1)                        = 1
write(1, "J", 1)                        = 1
write(1, "K", 1)                        = 1

I haven’t looked into gfortran’s implementation. I have not tested with any other compiler, either.
But as you said it is linked to libc

❯ ldd a.out
	linux-vdso.so.1 (0x00007fff5b46a000)
	libgfortran.so.5 => /lib64/libgfortran.so.5 (0x00007f0cf62c2000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f0cf617e000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f0cf6163000)
	libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007f0cf6119000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f0cf5f4a000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f0cf6589000)
4 Likes

uname -or: 5.14.13-200.fc34.x86_64 GNU/Linux
gfortran --version:GNU Fortran (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1)

Example: This reply

And I know how to search, there’s mention of call flush at [Stack Overflow] but it seems totally redundant for stdout since gfortran doesn’t seem to implement buffering at all.(fortran90 - How to flush stdout in Fortran 90? - Stack Overflow)

I understand the equivalent C code where fflush(stdout) is called after every putchar call. But C does implement it, so does any other language like rust because output buffering can significantly speed up the program.

If the user wants an unbuffered output, then there is call flush(fd). A string can be used as a buffer, then it becomes manual, writing the string, keeping track of the length and writing out the buffer and then clearing it. In my opinion behavior should be similar to other compiled languages, else it’s an unnecessary barrier for a new user.

I did not know about this syntax. This is cool, thanks.

Regarding this, in UNIX systems when a file is opened and a file descriptor is handed to a process, unless the program calls close, the changes to the file aren’t reflected on the program’s side. The program continues to ‘see’ the old file with its old permissions until it closes the file handle and reopens it. This is the reason why update process is so great on Linux, the open files can stay until they are closed and also can be replaced while being open.

In windows, I think we all have experienced the file is ‘locked’ (open in a process) and hence can’t be modified or deleted. So in both cases, kernels makes sure an open file handle isn’t disturbed from the perspective of the process.

Only problem is disk space running out. But, stdin, stdout and stderr aren’t real files, so the processes need not worry about that. This is probably why (my guess) the reason languages implement the stdout buffer without worrying.

Depending on exactly what you are doing, on a Linux system you should only see this going to an unbuffered device such as a tty or /dev/null. Unless you called the system routines to write in raw mode I believe you will see this behavior on those devices. I have had discussions about this in HPC environments where people unconditionally write debug statements but open the unit as /dev/null when they want to ignore the messages instead of making the writes conditional. /dev/null is not only not the same as not writing the output, but can slow you down more than if you wrote the lines to a file. Could you redirect to an actual file on your system instead of /dev/null and see if there is a change in the buffering?

And unlinking the file after opening it makes sure the file goes away when the process does actually close it or even crashes, which is great for scratch files. It drives me crazy that not all Fortran compilers do this with SCRATCH files but use named files for scratch, which gets left around even on Linux and Unix after a crash.

PS: I know gfortran supports fputc() as an extension, which would probably not change this, but can allow you to write to stdout as a stream, which unfortunately Fortran does not; and some compilers change behavior depending on whether an OPEN is used to enable ASYNCHRONOUS mode and can change their buffering depending on the RECL length or environment variables; I don’t have access to my list and it seems to change with each compiler release but if I remember correctly none of those affect gfortran behavior, but it did do the right thing with SCRATCH files.

Trivia: Given it may be intentional for clarity, but

do while (.true.)

can be reduced to

do

in the example.

Note that ifort(1) and nvfortran(1) buffer by default, which after reading the above looks like it is non-conformant to the standard(?) :slight_smile:

OOPS, it looks like an optimization where they do the equivalent of the implied DO and write out the 42 characters in one buffer, not generic buffering. Have to test that futher, put it looks like a “buffering” to one output for the DO, not buffering to thousands of characters.

gfortran(1) documentation explicitly states it buffers “regular files”. A tty or /dev/null does not normally meet the criteria of “regular files”; although in a batch environment or with redirection stdout may be pointed to a regular file. See:

https://gcc.gnu.org/onlinedocs/gfortran/Data-consistency-and-durability.html

1 Like

Can you provide any reference to this? Because writing to /dev/null is supposed to be very fast, since writing to /dev/null does not involve ‘disk sleep’ at all. Kernel immediately returns success since data is just discarded. The only overhead is making syscall and kernel returning success. But it should be much, much faster than actually writing a file on any disk because the kernel has to write to the file and then return status. Note that we are only discussing asynchronous file operations, where the program waits for the status of the write call.

Yes !, didn’t think of that. While redirecting to a regular file, there’s buffering.

write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 4136) = 4136

This is indeed a very behavior. For actual files, disk space could run out, but not for virtual files.

@kargl Take a look at this. Files are buffered, but stdout isn’t.

From the above result, even gfortran doesn’t seem to conform to the standard or interpretation of the (3), (6), (7) steps is wrong.

Whatever might be the standard, If an actual file is buffered, then there’s no concern in buffering stdout.

Is there any reason why gfortran does this for only regular files because gcc seems to buffer all outputs including /dev/null

Assuming gfortran does things similar to gcc when possible seems to lead to a problem at some point.

The conversion of large amounts of data to ASCII text via formatted writes can be expensive, so even before you write to any device I/O that you are going to throw away can be expensive, so never generate large amounts of data thinking it is free to write it to /dev/null.

WIth modern buffering and solid state disks or diskless nodes with memory-resident files and so many other variables the costs of writing data anywhere including /dev/null probably only matter in HPC environments where I have seen terabytes of messages created and then thrown away to /dev/null. Using such a technique as writing informational or debug messages to /dev/null or any other device instead of conditionally not calling the WRITE is just bad practice in my opinion.

You can easily write a program that does not write a message, writes it to an internal file but does not write it, writes it to a unit and the unit can be /dev/null, a regular file of various types, and the types of resources wasted will vary between different environments but will not be free for any device.

Another costly I/O is writing to stdout and just letting data scroll by you really do not need to use interactively. It is easy to forget all the steps that it takes to display that data in your terminal emulator and how much that typically buffered I/O is costing. I think you noted your terminal emulator was spinning up the CPU in your infinite loop test.

stdout is almost never buffered because the assumption is that it is being used interactively and that you want the data as soon as it is generated; which with a line-based Fortran program is usually a good assumption.

Sometimes you do want raw mode, but even in C that is not standardized, so there are system-specific routines you have to call to efficiently do ASCII-art or screen control or to be able to read a single key-press.

I suspect gfortran is doing a stat(3c) of the file and if it returns that it is a regular file or a pipe it is buffering, else it is not, and that the vast majority of the time that is a reasonable behavior (did not look to verify that though). It would so happen that anything that says “special” if you run the file(1) command on it would then not be buffered, which is also probably a safer bet.

$ file /dev/null
/dev/null: character special (1/3)

But because /dev/null is such a special case it probably warrants being buffered or even not being written to by Fortran programs; it looks like that varies from compiler to compiler as well as optimizations where some compilers catch that the I/O loop is equivalent to the implied DO
example and buffer it internally into a single request which is much faster going to stdout.

So assuming stdin and stdout are being used interactively flushing on each I/O actually makes perfect sense for the majority of times.

It looks like some programs are probably inadvertently paying for the lack of buffering to non-regular files, especially in the case with /dev/null if they throw away a lot of data that they write first. That could probably be improved on but my hunch is in normal practice not that many people are throwing away huge amounts of data to where that has been a problem, but in this case doing the right thing and testing with strace(1) to see what your program was doing got caught up in this issue where /dev/null is not buffered, giving you a very different behavior than what is probably your typical use case where you write the data to the screen or to a regular file.

It looks like some compilers are better at buffering up I/O internally than others, but I have seen that cause problems, especially with non-advancing I/O, when going to interactive devices; so in the vast majority of cases I think gfortran is doing OK with what it does(?)

PS; Also note that using the trick of opening informational files as “/dev/null” to throw them away can cause you problems if you want to do that with multiple files, as currently most Fortran compilers adhere to a standard where multiple LUNs cannot be opened to the same file (varies in actual implementation, and so can be a hidden portability problem).

1 Like

PS: great that you are using strace(1). Wish it was used more often.

I cannot count the times it has helped discover slowdowns caused by system calls someone was not aware of the cost of, such as excessive calls to things like getrusage, external processes, the cost of statusing large amounts of files, …

Yes, that seems logical. But what is the efficient way to enable and disable the debug print statements? Preprocessor directives?

But I still don’t understand how writing to /dev/null can be slower than an actual file. Is it because of the absence of output buffering for non-regular files? Or are you saying in general irrespective of gfortran.

 time x1 >x
   
real    1m51.059s
user    1m50.279s
sys     0m0.577s
urbanjs@venus:~$ time x1 >/dev/null

real    3m3.942s
user    2m10.459s
sys     0m53.480s
urbanjs@venus:~$ cat x1.f90
program test
implicit none
integer :: i, j
do j=1,10000000
   do i=49,90
      WRITE(*,"(A)")achar(i)
   enddo
print*,""
enddo
end program test

This is largely due to the buffering issues, but you can see writing to /dev/null is considerably slower using gfortran.

preprocessing can be a good solution. In fixed-format a very common extension in the past was that lines starting with a “D” in column 1 were treated as comments unless a compiler flag was supplied, in which case the “D” was ignored. Some modern compilers no longer supply that or a free-format equivalent, but some do. A shame there is no standard equivalent.

Even having I/O or conditionals in a routine can affect optimization, and so some form of preprocessing or conditional compilation can be the only solution, but in general it should not be overused. A conditional branch around the I/O and debug-related statements is sufficient for many routines.

Part of the reason you see code where error codes are returned but no I/O is produced in the routine is not only to do things like allow for multi-lingual message catalogs to be used but to reduce or prevent some of the problems with optimizing routines containing I/O.

I would encourage messages and parameter validation checking wherever it does not cause critical performance issues just using standard Fortran conditionals perhaps optionally triggered by an environment variable or input option passed to the code.

This really applies to very large programs where I have seen hundreds of gigabytes of I/O generated and then thrown to a unit assigned to /dev/null assuming that there is no penalty;
it is a “trick” that has been passed around several large institutions that I have a pet peeve about as it keeps rearing its head as it is so much easer to have some main program do

open(file=’/dev/null’,newunit=debug)

and have the DEBUG value in a module

and then have debug statements throughout the code that say

write(debug,…

with no conditionals. Then all you have to do is set the DEBUG value to something like 6 and you see all your debug statements, and/or you can change the LUN just is some stretches of code to see the debug statements from just routines. The concept is so much more appealing that most methods it gets passed around a lot; but is NOT a free or efficient solution.

Most users will not see this crushing their systems; but if you are running many thousands of processes doing this and see an entire cluster overloaded with I/O being generated in the inner loop of a massive program so it can be thrown away and fix that to see it start happening again with a new code shortly thereafter you get peeved. Take a look at a node running 96 of your infinite loops concurrently (mileage varies depending on a lot of system configurations) and you can see it is not going to do much else productively while it is throwing away write statements to /dev/null for “free”.

So all I really want to say is if you are writing large-scale or using something that is going to be used in a time-critical fashion ANY system calls should be avoided when possible, including I/O of any kind, but especially ASCII representations of data; and writing those to /dev/null does not make them overhead-free.

To me, there is something very pleasing about a Fortran program crushing a CPU and when you run an strace(1) on it you see nothing at all till it is ready to pop out some answers. As a co-worker has a habit of saying “Oog say system call bad”, while doing a bad impression of a cave-man. But putting that aside, he is nearly always right.

1 Like

Yes, ifort flushes the buffer upon encountering \n (LF) character

write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44
write(1, "123456789:;<=>?@ABCDEFGHIJKLMNOP"..., 44) = 44

but to my horror, intel compiler is another extreme,
if it doesn’t encounter linefeed (print*,"" in line 9), it never flushes (as fast as I have let it buffer)

mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3da3e92000
mremap(0x7f3da3e92000, 143360, 212992, MREMAP_MAYMOVE) = 0x7f3da3e5e000
mremap(0x7f3da3e5e000, 212992, 315392, MREMAP_MAYMOVE) = 0x7f3da3e5e000
mremap(0x7f3da3e5e000, 315392, 475136, MREMAP_MAYMOVE) = 0x7f3da3dea000
mremap(0x7f3da3dea000, 475136, 712704, MREMAP_MAYMOVE) = 0x7f3da3dea000
mremap(0x7f3da3dea000, 712704, 1064960, MREMAP_MAYMOVE) = 0x7f3da3ce6000
mremap(0x7f3da3ce6000, 1064960, 1597440, MREMAP_MAYMOVE) = 0x7f3da3ce6000
mremap(0x7f3da3ce6000, 1597440, 2392064, MREMAP_MAYMOVE) = 0x7f3da3a9e000
mremap(0x7f3da3a9e000, 2392064, 3588096, MREMAP_MAYMOVE) = 0x7f3da3a9e000
mremap(0x7f3da3a9e000, 3588096, 5382144, MREMAP_MAYMOVE) = 0x7f3da357c000
mremap(0x7f3da357c000, 5382144, 8073216, MREMAP_MAYMOVE) = 0x7f3da357c000
mremap(0x7f3da357c000, 8073216, 12107776, MREMAP_MAYMOVE) = 0x7f3da29f0000
mremap(0x7f3da29f0000, 12107776, 18161664, MREMAP_MAYMOVE) = 0x7f3da29f0000
mremap(0x7f3da29f0000, 18161664, 27242496, MREMAP_MAYMOVE) = 0x7f3da0ff5000
mremap(0x7f3da0ff5000, 27242496, 40861696, MREMAP_MAYMOVE) = 0x7f3da0ff5000
mremap(0x7f3da0ff5000, 40861696, 61292544, MREMAP_MAYMOVE) = 0x7f3d9d581000
mremap(0x7f3d9d581000, 61292544, 91938816, MREMAP_MAYMOVE) = 0x7f3d9d581000
mremap(0x7f3d9d581000, 91938816, 137908224, MREMAP_MAYMOVE) = 0x7f3d951fc000
mremap(0x7f3d951fc000, 137908224, 206860288, MREMAP_MAYMOVE) = 0x7f3d951fc000
mremap(0x7f3d951fc000, 206860288, 310288384, MREMAP_MAYMOVE) = 0x7f3d82a12000

Just keeps increasing the buffer size. Then dumps all at a time on interrupt signal ! (Or program end)

No observed changes between ‘regular files’ and ‘special files’

1 Like

Don’t knock ifort too badly, as it has a LOT of options (maybe too many?) for controlling buffer size and default record length with the caveat that you can really eat up memory with some of the defaults. Internal buffering can be very efficient if you can afford the memory use. To paraphrase a little “Oog say buffering good”, but that is one that can be taken to an extreme.
As regards I/O another one is “Oog say ASCII bad. Binary good.” Drives several people crazy including me when sequentially used engineering-scale data is written as ASCII XML in a scratch file instead of sequential binary data when no human is every going to look at the data. If you look at the timing difference there I have seen it be a factor of 200x.

1 Like

Due to this, the ASCII donut program is incompatible with ifort.

Had to fix the issue by writing a LF character to force ifort compiled program flush its buffer.

I added a new note in README.

Performance difference is HUGE.

So, I think gfortran should buffer outputs (alteast stdout and stderr) like gcc does.

Didn’t try it myself but I hope a FLUSH would work with ifort if the LF character is a problem.