With LLMs, should Fortran have more built-in procedures and delete more legacy features?

LLMs will do an increasing share of coding. I think that strengthens the case for more built-in procedures, for two reasons:

  • instead of LLMs creating the same procedures repeatedly, they can be instructed to use procedures from the Fortran 20xx standard when available
  • the cost of adding intrinsic procedures to a compiler has fallen

C++ for example, has many non-uniform RNG, listed at C++ TR1 random number generation notes . Should some of them be in the Fortran standard, especially common ones such as the normal and uniform integer distributions? Standard Fortran has ordinary but not modified Bessel functions. I would like to have the latter. There are probably other special functions people want.

More controversially, with LLMs good at fixing code and writing tools to fix code, I think the standards committee should be more aggressive about deleting legacy features such as implicit mapping.

If it is too late to change the feature set of Fortran 2028, proposals can be made for Fortran 2034. I could work on an expanded RNG proposal if there is interest.

3 Likes

I agree. Basic stuff like random number generators, mean, standard deviation, logsum should be intrinsic functions.

You could ask an LLM to do that part too. :slight_smile: Ideally you could feed it past C++ and Fortran standard proposals, to match the style and expectations of these proposals.

2 Likes

Can you formulate a principle how to determine if such a procedure should be in the Fortran Standard vs. in a library like stdlib?

For example,based on your post I think one could argue all of stdlib should be in the standard. Would you argue that? If not, why not? And what is the line where we should stop adding things to the standard and rather keep them in a library?

2 Likes

There are also different ways for a procedure to be “in” the standard. It could just be there, always in scope, like sqrt() or sin(), or it could be in an intrinsic module, like ieee_is_nan(). The programmer has more control over the scope with the latter approach. For example, the only: clause can be used to hide or selectively include some procedures, and they can be renamed locally => if there is a name conflict.

There are also some important differences between procedures and data types that are intrinsic and those defined in a library. One of them is whether the procedure can appear on the rhs of an initialization or in a parameter definition.

1 Like

Which requires most compilers to implement the intrinsic twice, once for compile-time evaluation for the parameter (which must be known at compile time as it can be used, e.g., as a kind parameter) and once for runtime. One could probably unify the effort by allowing to execute user code at compile time.

I did not know that. Could you give an example of this?

I am highly in favor of expanding compile time computing for Fortran. This would be excellent for complex initializations, or otherwise evaluation of complicated but static things.

1 Like
real, parameter :: x = sin(pi/3)

I see little point in deleting features (though there are some which I would love to kill). They are in legacy codes which will not go away. Our library of about 20,000,000 lines of code in current or recent use is about 75% non-buildable under the current standard, and needs other features which are deprecated. I will start another thread to explore this issue.

I have no experience writing compilers, so I still do not understand why this needs both compile time and run time evaluations. Assuming pi has been correctly defined previously, this is the same as the parameter definition

real, parameter :: x = 0.5 * sqrt(3.0)

Does that also require both compile time and run time evaluation?

Yes. Although sqrt is a little bit special because often there are CPU instructions that can do it. But sin() is a better example. In real, parameter :: x = sin(pi/3) the compiler must evaluate it at compile time, while if you do print *, sin(x) where x is read from a file, it must do it at runtime. So the compiler must have the ability to evaluate almost all of its intrinsics at both compile time and runtime.

1 Like

@certik is right. I have in the past had slightly different results from a compile-time and run-time intrinsic. Sorry I don’t recall which compiler it was or when. It might well have been in the previous millennium.

1 Like

As a starting point for discussion (as I for one find deciding the criteria an important but difficult requirement to construct) …

What should the criteria be for placing a function into the fortran standard library or becoming an intrinsic should be ..

A new intrinsic should add a capability that cannot be done with
self-contained standard pure Fortran code. This not only includes
functionality but performance. That is, if an intrinsic implementation
significantly outperforms any standard Fortran implementation it should
be considered for inclusion.

If the procedure can be generated with standard Fortran it should first be
available in a public module with a liberal license. A vendor-supplied
version may be provided but it should conform to the behavior of the
public module.

A focus on high utility, reliability, modern standards compliance,
and avoiding duplication of existing compiler intrinsics is a basic
principle on what should be considered for candidacy as a standard-specified
feature.

Key target criteria include portability, purity (no side effects) unless side effects are a specific purpose of the procedure,
explicit interfaces, and comprehensive documentation should be part of candidates

To minimize bloat broad utility should be heavily weighted. The function
must solve a common problem faced by many Fortran developers (e.g.,
advanced math, string manipulation, input/output).

Modern Fortran Compliance: Code should adhere to modern standards
(Fortran 2008, 2018), utilizing modules, strong typing, and avoiding
obsolescent features like fixed-form format, common blocks, or go
to statements.

Purity and Safety: Functions should be PURE (not modifying arguments)
and side-effect-free, using intent(in) for arguments to ensure thread
safety and predictability.

Explicit Interfaces: Procedures must be placed within modules to
provide explicit interfaces, which prevents many common bugs.

Portability: The code must be portable across different compilers
(GNU, Intel, NVIDIA) without requiring specialized hardware or
compiler-specific hacks.

Documentation and Testing: New functions require clear documentation
and comprehensive unit tests to ensure long-term maintenance and
reliability.

General Best Practices to Follow for prototype modules:

All code should reside within modules.

Use subroutines if a procedure needs to modify its arguments,
keeping functions purely for returning values.

Naming should be clean and descriptive without unnecessary prefixes
that add length rather than clarity.

functions shouldn’t have side effects and be thread-safe unless an
obvious reason prevents it.

Reasonably follow common Coding practices:

Do not use fixed-form format, always use implicit none, ...

The standard committee should be responsible for designating modules
as intrinsics, required, or optional, but required to take into
consideration votes taken at least every three years on the need for
the functions by the user community; particularly votes on functionality
that creates a new fortran capability.

Ideally the optional packages should be available from an fpm(1)
package repository.

When possible, modules prototyping the desired new capabilities should be
provided as well using a similiar scheme but where the modules are allowed
to use ISO_C_BINDING to prototype the functionality.

The first procedures to standardize should be the ones commonly available as
extensions (POSIX file/system interfaces or their equivalents on non-POSIX
platforms are overdue (by thirty years or more :>), a basic graphics library).

Strongly related issues are the standard requiring a cpp-like pre-processor.

1 Like

Yes, I have seen that before with floating point functions. Usually the differences are in the last few bits, but enough so that tests for exact equality will fail. I’ve also seen it with cross-compilers, where compile-time evaluation get values from the host machine while run time evaluation gets values from the target machine.

1 Like

That rule excludes a lot of quality-of-life stuff —e.g., we all have our own versions of upper_case/lower_case, which should have been standardized a long time ago… but the standards committee seems to have decided that rather than providing at least an ASCII version (as with ACHAR), it was better to provide nothing.

I think the NEXT/PREVIOUS intrinsics from Fortran 2023 already violate that rule (i.e., they’re elemental, but can cause a crash if the (intent(out)) STAT argument is not present).

1 Like

I agree UPPER, LOWER, SORT, … should all be available in a standard package. It is basically an unfortunate rite of passage to have to reinvent such functions but I think with hindsight some of the current intrinsics should be in modules like a “basic_math” module. Because Fortran did not have modules and such some of the current functions do not meet
that criteria, but going forward standard modules like the IEEE functions and so on seem like a better more modular approach, maybe with a standard set implicitly "USE"d or available via a single USE statement like “use standard”. because of the need for backward compatibility current names would of course have to work but putting everthing in as intrinsics could lead to bloat and could overwhelm the development of the core language. Several newer languages, for better or worse do that now “math.sin(x)” probably looks familiar. I do wish fortran allowed (but not required) that type of syntax.

2 Likes

I think the current rule of having intrinsics in modules only if they point to a third-party standard (like ieee_* and iso_c_binding) or are processor-related (like iso_fortran_env) is still the way to go. The intrinsic consideration is a last resort anyway (i.e., the processor considers the intrinsic subprogram only if no designator with that name has been defined).

I cannot count the number of times when, while doing something C-related, I have to resort to grep -r <symbol> /usr/include to find the actual header that defines what I need —e.g., “is it in <time.h> or in <sys/time.h> or in <sys/times.h>?.. oh, it’s actually in <sys/select.h>!”.

Looks like we all agree that any intrinsics should start as a third-party library (fpm installable, with good interface, documentation, etc.) and I also think people should be using it in their projects first. It could be part of stdlib or separate.

Thanks to open source, we can even see how many public projects use a given intrinsic. I would say if a given candidate function is used a lot in many projects, then it might be a good candidate. If it is not used in public projects, then maybe it can still be a candidate, but the bar should be higher.

Ok, but if a candidate function is not in the standard, it may have many spellings, so it’s nontrivial to identify commonly used candidates. A standard deviation function may be called sd, std (used in stdlib), stdev, std_dev etc.