Unstructured data

I keep hearing people say that C++ works better with unstructured data than Fortran does. Are there any examples of unstructured data that Fortran cannot work with? Just curious about that.

I don’t know about C++, but maybe what they mean is the possibility of having data structures with untyped fields which can be mutated. For instance, in Julia, or Python, one can do something like:

julia> mutable struct Test
           a
           b
       end

julia> x = Test("abc",1)
Test("abc", 1)

julia> x.a = 5
5

julia> x
Test(5, 1)

Meaning, If the fields can be anything, that is an easy way to deal with them (performance is out of the discussion here, this kind of structure is not good for performance). Also you can have dictionaries, for example. I don’t know how hard can be to emulate something like that in Fortran.

Not hard at all,

type :: m_type
    class(*), allocatable :: a, b
end type m_type
type(m_type("abc", 1)) :: mutable
mutable%a = 5
end
3 Likes

Are you sure, this is standard conforming? I’ve tried it with 3 different compilers, only one of them was able to compile it…

Honestly, I got surprised when gfortran compiled it too. I assumed it to be perhaps another example of my lack of full knowledge of F2018. But regardless of the initiation restrictions, the idea of unlimited polymorphic components has been with Fortran for nearly 20 years.

I don’t think this is quite valid syntax. Probably meant something like:

type :: m_type
    class(*), allocatable :: a, b
end type m_type
type(m_type) :: mutable
mutable = m_type("abc", 1)
mutable%a = 5
end
2 Likes

This is the standard-conforming syntax, I think, although gfortran cannot handle it.

Back to the topic, is that polymorphism that people is referring to when claiming that C++ deals better with unstructured data? It is possible that those claims are as out of date as I am.

And some higher order data structures like Dicts, for example, are available in some library?

I think two factors spread such (partially true and partly) unfounded claims,

  1. lack of diversity of libraries as seen in the C++ ecosystem. That might indeed be true. I have no idea. But even if true, it is not surprising to me. C++ is a general-purpose beast with a community >30 times larger than Fortran. So, one would expect the ecosystem to be also >30 larger than Fortran. The other question is, are the specified data structures in question needed in the community of Fortran programmers?
  2. People do not know about modern Fortran, even experienced Fortran programmers. Admittedly, I am one of them. Every week I learn something new about the existing decades-old features of Fortran despite having used the language for >15 years. Now imagine what non-Fortran programmers, who have only seen BLAS/LAPACK F77 code, would think of Fortran and its capabilities.
1 Like

Yeah, I’ve not had the best luck with unlimited polymorphic components with gfortran, and especially trying to use the intrinsic structure constructor with them. Writing your own seems to work ok most of the time though. I.e.

type :: m_type
    class(*), allocatable :: a, b
end type m_type
interface m_type
  module function constructor(a, b) result(new_m_type)
    class(*), intent(in) :: a, b
    type(m_type) :: new_m_type
  end function
end interface
module procedure constructor
  new_m_type%a = a
  new_m_type%b = b
end procedure
type(m_type) :: mutable
mutable = m_type("abc", 1)
mutable%a = 5
4 Likes

A category of my Fortran Code on GitHub list is Containers and Generic Programming, and two items are

fdict: native Fortran 90 dictionary with hash tables for retaining any data-type in a Python-like dictionary, by Nick Papior

Fortran Parameter List (FPL): Fortran 2003 library that can manage the parameters of a program from a single point, by victorsndvg. FPL is an extendible container (dictionary) of <Key, Value> pairs, where the Key is a character string and the Value can be, by the default, of the basic data types

2 Likes

Another question is how useful are such unstructured data using unlimited polymorphism. You can assign a value of any type to such a component but then to use it you probably always need elaborate select type construct.

1 Like

@fortran4r , can you please clarify what you especially have in mind with “unstructured data”?

Can you please provide a specific example or two from the context(s) where you “keep hearing people say that C++ works better with unstructured data than Fortran”?

1 Like

@fortran4r, but what “unstructured data structure” is of interest specifically to you, or in your estimation to the HPC domain or numerically computing?

The stackexchange thread you linked only appears to have one truly balanced post, that by @certik !! And there are lots and lots of words otherwise in that thread but no simple examples…

Clearly if one is looking for the equivalent of “generic container” classes in Fortran such as dictionaries and queues, etc., there has been a lot of user-side and ad hoc development in this area post-Fortran 2003 but still rather sophomoric compared to what one finds natively and/or readily in other languages.

However there is a lot of skepticism as to the true need for such “generic” classes in Fortran programs among the long-term Fortran practitioners (e.g., those with 30+ years of experience) as well as by commercial vendors who support such practitioners especially in the HPC/supercomputer space which, after all, remains a big source for funding for vendors. Such skepticism affects priorities significantly, I believe. However there is hope Fortran 202X will change some of this, but the effect of it will only come play circa 2040, too far out.

So in the meantime it will be useful to understand your use case. There are likely workarounds viable that provide somewhat specific (as opposed to truly generic) solutions using the current Fortran standard.

2 Likes

The current Fortran features perfectly satisfy my needs. I deal with stats stuff like binomial/trinomial trees and Monte Carlo simulations a lot, which all can be conveniently expressed in Fortran codes. I am just curious about what unstructured data are not workable with Fortran because I heard people mention this point on various occasions. That’s why I asked this question in this thread.

I guess it depends what we mean by generic. Fortran has a pretty good generic container called an “array”.

In a few instances, I’ve seen people complain Fortran has no “truly” generic container such as the Python list or dictionary, which can contain items of different type:

list1 = ['physics', 'chemistry', 1997, 2000]

I have yet to meet a compelling example where this would be needed. I see it in tutorials, but I haven’t really encountered it in actual projects. In the majority of cases I deal with collections of items of the same type.

Dictionaries are different because their advantage is storage/retrieval using keywords. This gives them a more natural syntax for certain applications (e.g. storing a collection of numeric/non-numeric options). The Python **kwargs are another great use of the dictionary.

1 Like

I think the discussion is mixed between “dynamic” and “generic”? (though they might be used interchangeably depending on cases). In my understanding, what is desired by many people (according to a survey some time ago) is a generic-type container, which is uniform in type but the type is specified later by the user, while “dynamic” variables correspond to so-called “Any” type (which, I guess, is not the primary interest in that survey).

A common enough need that is growing to be of massive importance in any technical domain is streams of data containing all kinds of types and abstractions.

For example,

  • data from scientific or small-scale experiments, pilot plants, large manufacturing facilities that are highly essential for any and all manner of machine learning and AI applications,
  • IoT streams - readings from sensors, ticker data, etc. which are crucial for real-time operations
  • multimedia data - audio, video, weather, geospatial, etc.
  • burgeoning information and knowledge generation in the form of all kinds of documents, etc.

These are all unstructured but paramount to advances in essentially all fields now that are relevant to scientific and technical computing.

The popular languages for engineers and technology e.g., the top ones in IEEE Spectrum Survey have or are all rapidly providing solutions for their practitioners regardless of their paradigm (imperative, etc.). Compared to whom Fortran is marching slower than at a snail’s pace and ceding the space in form and function.

If all Fortran can handle well natively are instructions involving arrays of integers and reals after all the preprocessing has been performed elsewhere and then crunch them into more arrays of integers and reals which now need to be handed off somewhere else for further postprocessing so it is useful for rapid consumption by humans and if that remains the vision of the standard-bearers, there will remain no raison d’etre for Fortran in most domains.

This is consistently the feedback from many senior technical and digital technology leads in many parts of industry and research. While some of the criticism is driven by lack of awareness and implicit biases and impatience, there is truth in the sense of what OP mentioned in the original post, “I keep hearing people say that C++ works better with unstructured data than Fortran does.” It also has to do with conveniences and flexibilities accorded to the programmers in the language standard itself and how quickly implementations catch up to the advances in the standard.

It’s both program performance and programmer productivity that are crucial. The balance remains skewed in the practice of Fortran away from the latter and that’s where it keeps paying a heavy price.

7 Likes

Most times when dealing with these containers that are inhomogeneous and dynamic, performance issues are not in consideration. Thus it is more practical to use a higher level language (i. e. Python) and call the fast routine written in Fortran when the data is parsed and structured already. This is probably one reason for these data structures not appearing often in Fortran code.

2 Likes