Unstructured data

ivanpribec · February 20, 2022, 11:04pm

I agree in general that Fortran needs to adapt quicker, but I’m just saying that not having a list with mixed items is not really a fundamental limitation. Here’s a reply from a StackExchange thread answering the question Am I safe mixing types in a Python list? (emphasis mine):

The language is fine with you mixing types in a list, but you should know that the Python culture might frown on it. Tuples are usually used when you have a known collection of mixed types, and different indexes have different semantics. Lists are usually used where you have a uniform sequence of varying length.

So your data would more conventionally be represented as:
data = [
   ("name1", "long name1", 1, 2, 3),
   ("name2", "long name2", 5, 6, 7),
   ...
]
To put it more succinctly: tuples are used like C structs, lists like C arrays.

To extend the answer above by analogy: tuples are used like Fortran derived types, lists like Fortran arrays. Now just because something can be done in a dynamic language like Python, doesn’t necessarily make it a good idea.

The Fortran code for the example above would be:

type :: record
  character(len=:), allocatable :: name, long_name
  integer :: a, b, c
end type

type(record), allocatable :: data(:)
data = [ &
    record("name1", "long name1", 1, 2, 3), &
    record("name2", "long name2", 5, 6, 7), &
    ... &
 ]

which is just as expressive for my taste. The decrease in productivity due to static typing is counter-balanced by the potential performance increase and safety. Which of these we value more depends heavily on the circumstances. How many records do we need to deal with? How many times will the application be reused? How much time do we have available? And sadly also, which language do my boss (and colleagues) prefer?

I also think it would be equally true to invert the sentence,

The popular languages for engineers and technology … have or are all rapidly providing solutions for their practitioners regardless of their paradigm (imperative, etc.).

leading to

The practitioners of engineering and technology have or are are all rapidly providing solutions for the popular languages regardless of their paradigm.

It is people who provide solutions using languages as a tool. Each of us has the opportunity to help improve and promote the language in many ways, by writing libraries for the areas you just mentioned, contributing to compilers, joining the efforts of the language committee, etc. I don’t see the Julia community waiting for the founding members to do all the work. Instead everyone rolls up their sleeves and tries to give back within their capabilities. Putting all the responsibility in the hands of the standard-bearers is just blame-shifting.

This reminds me of some nice sketches by A. Masselot that aim to illustrate workload sharing among processes

shahmoradi · February 20, 2022, 11:42pm

Other languages with such features have to write endless pages of guidance and warnings for people to avoid such patterns (See here and here for examples). Fortran appears to prevent it altogether if the potential damages outweigh the benefits. And that’s what makes Fortran fast, with no hassles and no warnings manual. Over the years, I have seen many (frequently unfair and unjustified) criticisms of the Fortran language. Still, almost every time I dug into the issues further, I realized the standard committee had excellent reasons to design things the way they are. That does not mean everything is perfect. In particular, the patchy and selective style of enhancements to the language, which FortranFan also mentioned earlier, is, has been, and will remain quite detrimental to the language. That’s my opinion as a frequent user, from the user perspective.

FortranFan · February 21, 2022, 2:27am

You could also add an idiom to go with the cartoon, say “It’s a poor workman who always blames his tools”! But then it’s also particularly true those who work on tools blame the workman, especially when the workers expect and demand more from the tools for the tasks at hand.

For the topic at hand with “unstructured data” however it’s defined, the fact remains it’s various programming conveniences with being able to define and work with flexible types (“classes”) that are often more relevant than numeric performance. And here, whatever holds Fortran back also has its roots in the language standard itself, there is no need to sugarcoat anything.

The original post with the comment "I keep hearing people say that C++ works better with unstructured data than Fortran does” immediately brought to my mind 2 aspects:

C++ dlib: dlib might just be a “textbook” example of a library in C++ serving needs involving processing of “unstructured data” in many domains successfully over quite a few years now and which has had stable releases too for quite a while. It just ain’t easy at all to put together a similarly featured library in Fortran given the current language standard and the compiler implementations: the state of FIG. 2.10 with “Many (happy?) workers” is simply unattainable at present with Fortran, the problem is with the “shovels”. Living in denial of this reality will only delay matters.
An attempt in the venerable “Modern Fortran Explained” at one such situation involving “unstructured data” - a complex data structure to manage a list of dynamic data (see image below) where this thread from 2014 came to my mind. The truth is with such a simple case in the modern world of “unstructured data”, it has taken the authors of MFE (with several decades’ of experience working with the language and the standard revisions) way too long of a time to get a compilable version of their example and it’s still not entirely without issues nor is it remotely competitive with what one can achieve with other languages. It just goes to show how difficult such an endeavor to manipulate a dynamic data structure will be for “mere mortals” in Fortran. And the problems get to the very root of some rather difficult semantics and constraints in the language standard with pointer components and what not and with which the compiler implementations and users alike struggle considerably.

Beating around the bush with this will be of least help to Fortran.

oscardssmith · February 21, 2022, 2:34am

One place where inherently unstructured data is really common is when writing compilers or doing things like symbolic math. In these cases, the structure is pretty much guaranteed to be dynamic enough that the best way of dealing with it is a fully dynamic data structure. This is especially important for numerical computing since there is a lot of interesting work in automatic code optimization (eg e-graphs) that depend on having this sort of flexibility.

certik · February 23, 2022, 1:58am

I’ve done both, and if performance is critical, which it often is for both, then C++ is the only tool that I personally was able to use to deliver a working product. For numerical computing that can be expressed using arrays, Fortran is my favorite tool to deliver excellent performance and maintainable product.

Thanks @FortranFan for quoting my post from over 10 years ago at Stack Exchange. I still like and agree with what I wrote!

Topic		Replies	Views
Update on Fortran Templates Announcements	18	1267	July 29, 2022
Some other use of `{}` Language enhancement	10	434	July 31, 2024
Polymorphic array in a derived type Help	5	792	March 22, 2022
2024: The Year in Review	5	207	January 2, 2025
Traits, Generics, and modern-day OO for Fortran Announcements	105	3479	June 30, 2025

Unstructured data

Related topics