Attribute for "pure" procedures that do I/O

Most subroutines I write that are not PURE do external I/O. Would it be useful for Fortran to have an attribute for procedures that do external I/O but are otherwise pure, maybe pure_io? This would clarify to the reader and compiler that a procedure does not have side effects such as changing module variables and perhaps enable some optimizations.

5 Likes

But isn’t IO exactly the case where you cannot avoid side effects and therefore always impure?

3 Likes

This would be very valuable. There are many cases where writing data to a log file can tell something useful about the behaviour without otherwise affecting the code. For example, we automatically instrument code for coverage analysis. We have to leave pure routines without instrumentation (or cheat) even though we are not changing the behaviour.

1 Like

Same here, and it’s usually error reporting. It would be great if some useful optimisations could be made for pure_io functions.

1 Like

What would be the definition of a pure if it is allowed to have side effects? For me, the current definition makes sense. I also don’t think there is anything wrong with writing non-pure functions.

1 Like

Just my 2 cents but I’ve always felt that the concept of purity (or more specifically impurity) should be more fine grain. Yes we can declare a procedure to be IMPURE but what I’m thinking about would be along the lines of two changes to the standard.

  1. First, allow for impure blocks (or scoping units) inside a otherwise pure procedure,

ie.
PURE Function a
BLOCK,IMPURE or BLOCK(IMPURE) etc
do some impure io on some variables that are local to the block
END BLOCK
End Function a

  1. Introduce the concept of PURE io by allowing a file to be declared a “PURE” file which would mean that it is effectively open as WRITE only (and cannot be reopened as READ/WRITE during the course of the execution. You would obvously want to be able to read it later but only after a STOP or ERROR STOP statement has closed the file

Open (newunit=pure_unit,file=“pure.dat”, ACCESS=PURE, Form=xxx, STATUS=xxx)

I’m sure there are a million reasons why this wouldn’t work and is probably undesirable but I’m just throwing a couple of thoughts out for discussion

1 Like

Metcalf, Reid, and Cohen (2018), section 7.8 write

Declaring a procedure to be pure is an assertion that the procedure
i) if a function, does not alter any dummy argument;
ii) does not alter any part of a variable accessed by host or use association;
iii) contains no local variable with the save attribute (Section 8.10);
iv) performs no operation on an external file (Chapters 10 and 12);
v) contains no stop or error stop (Section 17.14) statement; and
vi) contains no image control statement (Section 17.13).

I am not suggesting that the constraints on pure procedures be relaxed but that a new attrribute pure_io be introduced that would satisfy the constraints above, except for

iv) performs no operation on an external file

1 Like

Allowing file operations is a loophole that allows to change the state of the world. This is exactly what a pure procedure should not do. One can misuse a file for a global variable or a persistent variable so it would be sensible to relax ii) and iii) together with iv).

As said, I don’t understand the desire to declare a procedure that does file operations as pure. The pure attribute simply tells the compiler that the order of execution does not matter which simplifies optimization. But if file IO is involved it will most likely be the bottleneck so optimization should not be too important.

What I understand is the annoyance when adding debug output to a pure procedure which is at the end of a chain of calls to pure procedures. Having a build system that removes pure attributes automatically for debug builds would solve this.

3 Likes

I believe you can do error stop in pure procedures correct? I can imagine that you might want to print a much nicer error message and stop the program, perhaps also something to a log file. That is currently not allowed in pure. These pure functions might be performance critical and could perform very fast, but have some kind of a failure mode that needs to abort the program.

Yes, as Metcalf/Reid/Cohen mention in Chapter 23, “Minor Fortran 2018 features”.

1 Like

Exactly - this is the reason most of my functions don’t end up as pure. The workaround to keep the function pure is to have the function return some kind of error status/message, but it all soon gets quite fiddly.

1 Like

The point of pure procedures is that it allows a compiler to reorder and/or parallelize references to them. Allowing I/O would break that.

Fortran 202X adds simple procedures, which are pure but with additional restrictions, mainly that they can’t reference COMMON or variables by use or host association, except in “specification inquiry that is a constant expression”.

3 Likes

Recently (F2018?), the requirement that the stop-code needs to be a constant expression has been relaxed. That makes it possible to write a pure exit function that takes care of assembling a nice error message:

program test

  implicit none

  call exit(1,'test')
contains

pure subroutine exit(code,msg)

  integer, intent(in) :: code
  character(len=*), intent(in) :: msg

  if (code == 1) then
    error stop 'error '//trim(msg)
  else
    error stop 'error (unknown)'
  end if

end subroutine

end program test
4 Likes

What about logging (or as other proposed “write only”), when the exact order of the logged events doesn’t matter? Maybe it would be possible to solve this issue with a compiler flag (e.g. -fpure-writing) which allows printing/logging in pure functions, which would automatically mean, the order of the output from pure functions can be jumbled.
I remember one day I wanted to debug a pure function, which was called by some other pure functions, so I had to “unpure” every single function just to debug a tiny thing (yes, I could have used a debugger, but sometimes I think its easier with a simple print*).

6 Likes

You can use the preprocessor to achieve this. Here’s the macro you need, taken out of FOODIE:

#ifdef _IMPURE_
#define _PURE_
#define _ELEMENTAL_
#else
#define _PURE_ pure
#define _ELEMENTAL_ elemental
#endif

In your procedures use the _PURE_ macro instead of the pure attribute. You could also put the print statements within impure/verbose preprocessor fences.

Another “trick” to ease the work of removing and adding pure attribute back, is to do the job in a new code branch. Once you’ve identified the bug, you can merge back only the fixed part.

3 Likes

Re: “… yes, I could have used a debugger, but sometimes I think its easier with a simple print*,” but considering the calendar on the figurative wall and what would be on the anvil otherwise - all the monumental effort and countless person-years of attention with the Fortran language standard and processor implementations for a “half” attribute as suggested in the original post, Fortranners making use more and more of visual debuggers appears far more productive and efficient for the Fortran ecosystem overall.

For users of Windows OS and Visual Studio IDE and Intel Fortran compiler (its other issues notwithstanding), the needs to introduce “a simple print *” have long been few and far in between, one can “simply” instead strive to have all library procedures as PURE / ELEMENTAL and “break” and step through the code in debug mode.

1 Like

I agree that this feature should not be implemented if the only use-case is “easier” debugging.

If the Fortran standard committee have their druthers should such a suggestion come by, they will vote no faster than the “New York minute”! Regardless, what is suggested is `pure_o", not “_io”; “i” as in input appears an unacceptable side effect, so much so it removes any notion of purity.

2 Likes

A simple solution for debugging is to declare a pure interface for an external procedure that does the I/O, calling it from the pure procedure. “Cheat” by omitting pure from the external procedure. You may need to disable compiler checks to build this, but it’s for debugging purposes. Remove the call when debugging is done.

3 Likes

I guess this is impossible by definition. Ok, let’s assume that your function is about to be run 10000 times, compiler/runtime decided to run your procedure in 16 threads distributed over 4 nodes. The time of the subroutine execution may differ depending on the parameters values. Memorization of last 100 calls is employed (e.g. the result will not be recalculated if the function is called with the same combination of parameters less than 100 calls ago). [Formally, you or somebody else can do everything mentioned above if the function is indeed pure]. Imagine the mess in your log.

You will need some means to ensure transaction-style IO, at least.