A Cautionary Tale on Object-Oriented Programming

Intro

In 2021 I was learning about object-oriented programming in Fortran while writing a little simulation code for industrial food dryers as part of my research. The math was not that complicated, essentially it was just a 1-D PDE solver with some coupled ODE equations.

The industrial dryers on the other hand, can be quite complicated; they have multiple sections or stages which can be controlled independently and tuned precisely to deliver the expected quality of the final product. Here is a schema of a continuous pasta dryer, taken from a patent by the company BĆ¼hler, showing also the product drying curve (Pf).

image

The piecewise functions for temperature (T) and relative humidity (r.F.) shown in the graph above enter the boundary condition of the PDE for moisture transport within the (wet) pasta dough.

I decided I would use an abstract base class for the drying programs:

  type, abstract :: drying_program_base
    integer :: n  ! number of variables/settings
  contains
    procedure(get_air_condition_interface), deferred :: get_air_condition
    procedure(getvars_interface), deferred :: getvars
    procedure(setvars_interface), deferred :: setvars
  end type

The child classes would implement methods to return the temperature and humidity at a given point in time. I also required methods to serialize/deserialize the classes. E.g. when performing mathematical optimization of the dryer, the optimization codes expects a packed vector of variables x(:) and not some custom type like this child class:

  !> Three-stage program with flat settings
  type, extends(drying_program_base) :: threestage_flat
    real(wp) :: T(3), RH(3)  ! temperature and humidity
    real(wp) :: f1, f2  ! fraction of time in stage 1 and 2
    real(wp) :: total_time  ! total_time := f1 + f2 + f3
  contains
    ! [... omitted ...]
  end type

My code grew as I added different types of programs, two-stage, three-stage, n-stage, constant temperature, rising temperatureā€¦ I also made a child class where the temperature and humidity curves are Lagrange polynomials, but decided to abandon the idea. A smooth curve may seem like a good idea, but in practice flat sections and linear ramps are easier to implement and control.

The main driver of my program became obfuscated with large chunks of code dealing with allocation of the polymorphic child types and their initialization in lengthy select type blocks. When I added the constrained mathematical optimization, things snow-balled further with the packing/unpacking of the variables.

Something wasnā€™t right. The object-oriented language features I thought would help me organize everything, were leading to code bloat.

Two years later

Today, I revisited the problem, forcing myself to think, what is the shortest way of getting an n-stage drying program into my simulation subroutine.

Hereā€™s the sketch I came up with:

integer :: prog, nstages, i
real(wp) :: time, duration, temperature, heating_rate
real(wp) :: relative_humidity, humidification_rate

open(unit=prog,file="program.txt",action="read")  ! program.txt contains the dryer settings
read(prog,*) num_stages  ! how many stages does the dryer have

time = 0  ! start drying
do i = 1, nstages  ! for each stage

    ! read the settings from the file
    read(prog,*) duration, temperature, heating_rate, &
        relative_humidity, humidification_rate
    
    ! perform simulation for stage i
    call drysim(t=time, tend=time+duration, &
        temp=temperature, qtemp=heating_rate, &
        rh=relative_humidity, rhdot=humidification_rate, &
        [... other parameters and options ...] )
    ! at exit from drysim, 
    ! time := time + duration
end do
close(prog) ! close the drying program

In contrast to my previous code, there are no derived types, no polymorphism, everything is obvious. The tables of drying settings can be loaded and exchanged easily as files without the need for recompilation. In case of mathematical optimization, itā€™s trivial to pack the variables into a vector. All I needed was a sane data representation.

Soon afterward I remembered a cautionary tale on meta-programming given by Steven Johnson at JuliaCon 2019 (Steven Johnson is co-author of FFTW, and professor of applied mathematics at MIT). The relevant section starts at 13:30 (810 s), and lasts approx. 2 minutes:

As he says from his tale on Pascal code generation (and matching my epiphany),

[ā€¦] thereā€™s a lot of cases like that, I think, where people just starting out are really tempted to do meta-programming, and you know, because spitting out code like this, itā€™s really easy to see whatā€™s going on, thinking about ā€œOh, I actually have a data structure in memory that has all this,ā€ thatā€™s a higher level of abstraction ā€¦ not that much higher but [ā€¦]

10 Likes

I have not tried to use the OOP features of Fortran 2003+ in a serious code. What you wrote makes sense, but a basic Fortran 95 derived type such as

type :: drying_dt
real(wp) :: time, duration, temperature, heating_rate, relative_humidity, humidification_rate
end type

could be a good fit for your programs. In general, when a group of related variables is often declared together, read and written together, and passed together to and from procedures, it makes sense to wrap them in a derived type.

2 Likes

Indeed, that would do no harm. I think I will just go with

real(wp) :: x(5)  ! duration, T, heating_rate, RH, humidification_rate

for a single stage, and a rank-2 array with pointer remapping when passing the vector of variables to the optimization routine

integer, parameter :: nvars = 5
integer :: nstages
real(wp), target :: x(nvars,nstages)  ! <-- entire drying program
real(wp), pointer :: xf(:) => null()  ! <-- same, but as rank-1 array 
associate(n => nvars * nstages)
    xf(1:n) => x
    call opt%optimize(x=xf,...)
end associate

Anyways, the moral of the story is that classes should be used with care, and polymorphism introduced when different behavior is needed and not just to manage a different ā€œvolumeā€ of data, like in my ill-devised beginner attempt.

In his book, A Philosopy of Software Design, John Ousterhout (author of Tcl, Tk) writes:

Designing software is hard, so itā€™s unlikely your first thoughts about how to structure a module or system will produce the best design. Youā€™ll end up with a much better result if you consider multiple options for each major decision: design it twice.
[ ā€¦ ]
The design-it-twice approach not only improves your designs, but it also improves your design skills. The process of devising and comparing multiple approaches will teach you about factors that make designs better or worse. Over time, this will make it easier for you to rule out bad designs and hone in on really great ones.

3 Likes

Then in all parts of the program you must remember that x(3) refers to heating_rate. If the 5-element array will be used in many places, a module could be defined

module m
integer, parameter :: iduration=1, itime=2, iheating_rate=3, irh=4, ihumidification=5
end module m

so that you can refer to
x(iheating_rate).

1 Like

I agree getting the indexes correct is a risk; also in case new variables were added. Iā€™ve used the approach you suggest on a few other occasions.

In this case, once read from the file, the parameters are only used in the simulation subroutine, where more descriptive names are used:

subroutine drysim(t,tend,temp,qtemp,rh,rhdot,...)
  real(wp), intent(inout) :: t  ! current time
  real(wp), intent(in) :: tend  ! final time
  real(wp), intent(in) :: temp, qtemp  ! temperature and heating rate
  real(wp), intent(in) :: rh, rhdot  ! relative humidity and humidification rate
  
  do while (t <= tend)
     ! actual calculation
  end do
end subroutine

The reason I used long names in the original post was to provide more context on how the variables relate to the dryer graph.

In hindsight this all seems obvious; I guess at the time I was thinking my simulation routine should only be entered once, hence I need to load all my data into a derived type so itā€™s available at any point in time.

Before the class-based approach, I was using callbacks. The settings were obtained from a table search:

  subroutine buehler5(time,T,RH)
    real(wp), intent(in) :: time  !! Process time (s)
    real(wp), intent(out) :: T  !! Absolute temperature (K)
    real(wp), intent(out) :: RH  !! Relative humidity (/) 

    ! drying settings
    real(wp), parameter :: temp_(6) = [69,92,106,84,77,34] + 273.15_wp  ! Kelvin
    real(wp), parameter :: rh_(6) = [50,61,64,78,95,74]/100._wp
    real(wp), parameter :: time_(7) = [0,3,16,30,139,151,154]*60._wp ! seconds

    integer :: i

    call locate(time_,size(time_),time,i)  ! linear search
    if (i == size(time_)) i = i - 1 ! extrapolate last value
    if (i > 0) then
        T = temp_(i)
        RH = rh_(i)
    else
      write(*,*) "[buehler5] The value ",time," is below the range of the table."
      error stop
    end if  
  end subroutine

This function would be evaluated at each iteration of the ODE/PDE solver, even though the values donā€™t change from step to step, most of the time.

In case of programs with linear ramps or over-lapping temperature and moisture steps, the logic was needlessly complex:

    real(wp), parameter :: temp(5) = [real(wp) :: 45,85,92.5,88,25] + 273.15_wp ! Kelvin
    real(wp), parameter :: time_temp(5) = [0,40,130,159,180]*60._wp ! seconds  

    real(wp), parameter :: rh_(7) = [real(wp) :: 55, 60, 69, 74, 85, 60, 75]/100._wp
    real(wp), parameter :: time_rh(7) = [0,9,20,40,130,159,180]*60._wp

    integer :: it, irh

    call locate(time_temp,5,time,it)  ! <-- linear search
    call locate(time_rh,7,time,irh)
    if (it > 0) then
      if (it == 1 .or. it == 4) then ! linear ramp
        T = temp(it) + (time - time_temp(it))/(time_temp(it+1) - time_temp(it))*(temp(it+1) - temp(it))
      else if (it > 1 .and. it < 4) then
        T = temp(it+1)! flat
      else
        T = temp(5)
      end if
      if (irh == 6) then ! linear ramp
        RH = rh_(irh) + (time - time_rh(irh))/(time_rh(irh+1) - time_rh(irh))*(rh_(irh+1) - rh_(irh))
      else ! flat
        RH = rh_(irh)
      end if
    else
      write(*,*) "[WO_prog1] The value ",time," is below the range of the table."
      error stop 1
    end if
  end subroutine

Would you have been able to tell from the logic this encodes the following settings:

With the approach shown in the new sketch, I just store the following table (comments added for explanation):

6                                # number of stages
 9 45.0  1.0  55.0  0.0          #  9 minutes, T = 45 Ā°C and rising at 1 Ā°C/min, RH = 55 %
11 54.0  1.0  60.0  0.0          # 11 minutes, T = 54 Ā°C and rising at 1 Ā°C/min, RH = 60 %  
20 65.0  1.0  69.0  0.0          # ...
90 92.5  0.0  74.0  0.0
29 88.0  0.0  85.0  0.0
21 88.0 -3.0  60.0  0.714285714

I hope there is something to learn from my rambling.

I agree with @Beliavsky , from what is shown thus far, judicious use of derived types rather than ā€œa rank-2 array with pointer remappingā€ and such would be a better way to go if this is anything more than a ā€œtoyā€ effort.

Please note any practical use of such simulation code in relatively broad manner will be helped by flexible unit-of-measure (UoM) computations of quantities. Fortran codes really can do with CONSTEXPR functions rather than hard-wired UoM conversions:

Thanks @FortranFan for the comment; the point Iā€™m trying to raise is different. I was constantly trying to encode the data within the program either as a procedure or within an object. This was making the program large and laborious to extend. Instead, I could have just operated on the data as soon as it was read from the file.

A drying operator should be able to play with new settings from the outside, and not expected to write and compile a new routine, and relink the program. If anything, my object-oriented solution was my private ā€œtoyā€ effort.

The drying settings are most naturally stored as a table of numbers. Implicitly, the table already defines a protocol. You first read the number of stages, then you ask for data when you need it. Dead simple. (We donā€™t even need to store the number of stages, we could just make the loop quit when the file ends. Thatā€™s minimalism.)

Thereā€™s an analogy to be drawn here with PostScript or Forth, both very minimalistic concatenative programming languages.

Say I have a 2-stage dryer, with constant conditions in each stage:

Stage Duration [min] Temperature [Ā°C] Relative Humidity [%]
1 60 90 60
2 30 40 50

We donā€™t need an abstract base class, and a concrete two-stage child class, to get the job done. We just need a timer and two knobs (the loop index and two real variables). The number of stages is determined by the number of rows. The table is read from a file, either into an array or a derived type as you and @Beliavsky suggest, but itā€™s not a parameter or a constexpr, because that defeats the general purpose of the program.

In the Starting Forth book there is an example of a washing machine program:

: washer wash spin rinse repeat ;

washer is the name of a procedure, and wash, spin, rinse, repeat are words. Each word has itā€™s own definition, e.g.:

: rinse fill agitate drain ;
: fill faucets open till-full faucets close;

Each time a Forth processor will read a word, it will look into an internal dictionary for a matching definition and if it finds one, execute it. If a matching definition canā€™t be found, it will assume the word is a number and attempt to convert it to one. Thatā€™s more or less it, the whole language.

In Forth the drying program would be something along the lines of

: stage-one dry at-temp 90 and-humidity 60 for-duration 60 ;
: stage-two dry at-temp 40 and-humidity 50 for-duration 30 ;
: dryer stage-one stage-two ;

The words at-temp, and-humidity, for-duration would push the values onto a stack. The word dry would call the Fortran simulation routine using the three top values from the stack as input.

1 Like

Fortranā€™s dirty little secret is the introduction of modules, derived types, generic interfaces, and operator overloading in Fortran 90/95 enabled object-based (as opposed to object oriented) programming. The work of Szymanski, Norton, and Decyk along with Ed Akinā€™s book showed how to build on the object-based approach to approximate most of the useful features of object-oriented programming (without using a single SELECT TYPE). I prefer to start with an object-based approach because I believe it forces you to think about what I consider the two most important aspects of OO. Namely, designing around interfaces and not an implementation and favoring composition and aggregation over pure inheritance. I will only use the F2003 OOP features when I see a benefit to using them over just sticking with an object-based design

5 Likes

This matches my experience also, I donā€™t use OO at all, and I even rarely use a derived type, only when some variables are always passed and set together, as @Beliavsky said. Here is an example of this style: fastGPT/gpt2.f90 at 4e70c6a0e9f60a1b3c94f18b32c34a6012de1e7b Ā· certik/fastGPT Ā· GitHub, there is one derived type, so that I donā€™t have to put 20 variables as arguments, but otherwise itā€™s as simple as it can get ā€” yet it is very versatile.

4 Likes

From my experience with object oriented programming in Fortran Iā€™m actually quite happy with the provided features, except for some caveats. I am actually quite heavily reliant on OOP features in most of my projects so far, the most obvious is TOML Fortran where I use it to define the recursive data structures.

Still, TOML Fortran does not go ā€œall inā€ with OOP, it is used to define the recursive data structures and abstract the data storage or to create the interfaces for implementing the lexer. The latter was quite crucial to have as it allowed to implement JSON support as separate library on top of TOML Fortran without any change in the main code base. It turns out that most objects can be quite minimal, most of the actual interaction from the user doesnā€™t involve any type-bound procedures but is rather dispatched via a generic interface for normal procedures, this way the objects donā€™t get bloated with dozens of procedure pointers which have to be resolved at runtime.

The main issue I see is the deep nesting needed to define the recursive data structures without pointers, the data ends up scattered all over the place, when I wish for them to be more compact: a table contains a storage structure, which contains nodes wrapping an allocatable value, so the next table is three pointers allocatables away, while I would like to have only one or maybe a fixed offset, but this requires more manual memory management.

For my more scientific projects I usually have just derived types, which in most cases could be a parametrized derived type, but those donā€™t really work well in arrays. Most of the difficulty is usually in the initialization of the object, e.g. the connection with a deserializer, afterwards it is used as immutable. The objects usually act like a container to abstract an interface for evaluating an interaction, the actual implementation in turn is not going to use a single procedure pointer or type bound procedure in the performance critical part. This kind of abstraction usually has worked and scaled well for me.

1 Like

I have always been reluctant to use OOP, firstly because I was fearing some performance impediment, secondly because it seemed a lot of efforts for my kind of programming (my codes are typically at max a few thousands lines long). I have just learned the basics of OOP in Fortran a few weeks ago. And I have used it in Python to refactor some parts of my cfwrapper.

It is interesting as you are obliged to think about the structure of your program in a different way. But I have always thought that it was a programming method for big projects (if you have 100 000 lines of code, I understand/imagine it may be useful). But what is a big project? I have a ray-tracing code for solar cells textures that has 2600 lines of code and I feel it could maybe have been better structured with OOP. The problem is that if a project is not based on OOP from start, it can be difficult to refactor it later.

Finally, for my relatively small Fortran projects, the module approach is generally sufficient. You group data and functions/procedures on those data. It is simpler/lighter than OOP.

I am committed in learning more Fortran OOP, but without being totally convinced (for my personal usage). More by curiosity, and to explore and see if it may bring something to me.

From Object-oriented programming - Wikipedia :

The latter point is reiterated by Joe Armstrong, the principal inventor of Erlang, who is quoted as saying:[36]
The problem with object-oriented languages is theyā€™ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.

2 Likes

I donā€™t know if an OOP scale exist, but it could be something like:

  • Level 0: modular programming (datas and procedures grouped in modules)
  • Level 1: true OOP (datas and procedures put together in derived types)
  • Level 2: inheritance
  • Level 3: encapsulation
  • Then polymorphism, abstract classes, etc.

I am personally essentially at level 0, with a few incursions at level 1, and exceptional incursion at level 2. For the moment. I may evolve. Or notā€¦

I guess the point I was trying to get across in my previous post is that knowing something about object-oriented (or based) design, specifically all the work done on design patterns etc., is more important than object-oriented programming (ie using compiler specific object-oriented features). Reiterating what I said before, OO design forces you to think about your code along the lines of what your interfaces need to look like and how you build a new type by compositing or aggregating other derived types. Adopting an object- based design approach allows you to think objectively but implement things in a (semi) procedural manner without decending into the seven levels of hell that is SELECT TYPE etc. Plus you have a better chance of avoiding the inevitable ICE that comes with a buggy compiler OOP implementation. (I have a code that will compile correctly with gfortran, ifort, and (now) ifx but refused to compile with any of the other LLVM based compilers).

1 Like

In Understanding the need for class variables and select type @everythingfunctional wrote

My recommendations are that one should use type , or carefully design deferred type bound procedures defined by the type that one will declare variables of class of such that any select type s will be unnecessary.

and I gave a simple example of this, which works at least with gfortran.

1 Like

Last year I ended up using select type in neural-fortran to handle different layer types in a neural network. After experimenting with select rank at the mid-level and pointer array remapping (terminology?) at the low-level and not being happy with the outcome, I ended up with the following design requirements:

  • At the low level (compute kernels for each layer) I want plain array operations to work on data ranks and shapes that are native to each layer;
  • At the high level (network), I want to be able to iterate over and do whole-array operations over all layers.

The above requirements led me to use an abstract type for a base layer and extend it with a concrete-specific layer type. The network then contains an array of base layers that it can iterate over. This requires type guarding with select type at the mid-level (itā€™s time to do something with a layer; what layer am I?) and a lot of boilerplate. Most (90%?) of the boilerplate actually isnā€™t even about type guarding, but the senseless limitation of the Fortran syntax to allow you something like type is (dense, conv2d, ...) for types that are doing the same thing. Iā€™m not happy with this approach, but letā€™s say Iā€™m least unhappy of the approaches Iā€™m aware of.

@rouson recently pointed out to me that itā€™s possible to extend an abstract type with another abstract type to build a hierarchy. Then, rather than guarding with many type is (concrete_type), I can do away with a single class is (extended_abstract_type).

Whether to OOP or not I think is less about style preference but about what is the problem that needs to be solved. I think there are problems where OOP is the simplest solution. Plain arrays and intrinsic types donā€™t get too far when there is a hierarchy of abstractions to be modeled.

5 Likes

@ivanpribec , @certik, and anyone who doubts the value of OO,

Please note there needs to be a strong cautionary tale against the cautionary tales around the problems with object-oriented programming (OOP).

As alluded to by @rwmsu , OOP must really be the last stage after considerable attention is devoted to object-oriented analysis (OOA) and object-oriented design (OOD). Often the order is reversed, particularly as someone gets enamored by OO given the influence of Java, C++, .NET, etc. and jumps head first into OOP while overlooking entirely OOA and OOD. That is a recipe for trouble.

Separately, note in the context of Fortran and similar statically typed languages, OOA and OOD can really help with packaging modular components that can also facilitate easier concurrent and parallel executions. A very simple example of this is given here.

Additionally the modular design of object components via OOA and OOD can help arrive at codes that over the longer term can be easier to maintain and extend.

Bottom-line: there are parallels with architecture with physical spaces and OO where living or working or leisure experiences and functionality and enjoyability and elegance and long-term viability and productivity are achieved by those who can bring sound analyses and innovative designs.

3 Likes

Reminds me of that article ā€œā€˜GOTO Considered Harmfulā€™ Considered Harmfulā€. :slight_smile:

2 Likes

Just my opinion, but Iā€™ve always felt that the Fortran OOP features were designed more to support people writing libraries than folks writing a problem specific application. I agree that OOP has its place but its not the end all and be all that it was hyped to be. Most of the C++ codes in the areas I work in (CFD and FEM) are heavy users of templates (static polymorphism) but not a lot of run-time polymorphism. I think you can make a case that templates saved C++ (at least for scientific programming) from being just an interesting academic experiment. Iā€™ve always felt that if Fortran had something like templates before F2003 then the OOP features that eventually made it into the language would be simpler to use and less focused on inheritance and run-time polymorphism that should be avoided if possible in scientific applications for performance reasons. All the early attempts at implementing OOP FEM codes in C++ always underperformed (sometimes by an order of magnitude) their existing Fortran counterparts. It was only when expression templates were introduced did their performance reach Fortran levels.

4 Likes

I think having all these different capabilities available in a language is useful, and one advantage of C++ is that you can use any of these approaches if you want to. For performance code in C++ indeed I donā€™t use runtime inheritance at all. I do use compile time inheritance (via the CRTP pattern), that is useful.

The progression of most programmers is that they first learn how to program in a simple procedural style, then they learn OOP and many other features and languages. Then they learn how to use all these mechanisms as tools with pros and cons and you make engineering judgement for the task at hand which one to use and how much. I think also often times it just depends on your taste and what you like, as well as how efficient you are using the tool to achieve what you want. We also each have a different level of ability to read through ā€œabstractā€ code, as well as tolerance and urge/motivation to achieve more abstract (and simpler in some, often subjective, metric) code. I am still learning and improving every day, even though Iā€™ve been programming for over 30 years. I think the most clear answer is that there is more than one way.

1 Like