Research software code is likely to remain a tangled mess

Research software code is likely to remain a tangled mess

Research software (i.e., software written to support research in engineering or the sciences) is usually a tangled mess of spaghetti code that only the author knows how to use. Very occasionally I encounter well organized research software that can be used without having an email conversation with the author (who has invariably spent years iterating through many versions).

Spaghetti code is not unique to academia, there is plenty to be found in industry.

Structural differences between academia and industry make it likely that research software will always be a tangled mess, only usable by the person who wrote it.

Using MODULEs, IMPLICIT NONE, picky compiler options, and multiple compilers can help. A general problem is that graduate students, post-docs, and professors are usually rewarded for publications, not their software.


Perhaps only researchers passionate by programming will progress during their career…

Yes, concerning that problem I made a post in July 2020:

1 Like

And perhaps things are evolving. For example, in the “Ten Simple Rules” articles, we can read:

  • Prlić, Andreas, and James B. Procter. ‘Ten Simple Rules for the Open Development of Scientific Software’. PLoS Computational Biology 8, no. 12 (6 December 2012): e1002802.
  • Sandve, Geir Kjetil, Anton Nekrutenko, James Taylor, and Eivind Hovig. ‘Ten Simple Rules for Reproducible Computational Research’. Edited by Philip E. Bourne. PLoS Computational Biology 9, no. 10 (24 October 2013): e1003285.
  • Taschuk, Morgan, and Greg Wilson. ‘Ten Simple Rules for Making Research Software More Robust’. PLOS Computational Biology 13, no. 4 (13 April 2017): e1005412.
  • Lee, Benjamin D. ‘Ten Simple Rules for Documenting Scientific Software’. Edited by Scott Markel. PLOS Computational Biology 14, no. 12 (20 December 2018): e1006561.

Messy code is often a minor problem compared to the contortions of the “build system” that is used to compile and link the program.


One of the comments below the article says “people do like to reinvent the wheel”, well… it’s not that people like to reinvent the wheel. A lot of scientists/engineers don’t even know those “wheels” exist and some of the wheels are wrapped within some more sophisticated yet complicated APIs that learning how to use those APIs are more timing-consuming than just inventing a get-the-job-done wheel.

As a graduate student, I always have to balance between “spending more time on the readability and optimization of my code so me, my advisor and the following graduate students can benefit from it” and “getting the job done so I can get the paper published and thereby get my degree sooner”. It’s a never ending battle…


Yes, it’s probably why people like using languages like Python.


It looks to me (an outsider) like there may be a parallel between how people use APIs and how they use mathematics: There is a wealth of sophisticated results available that may very well apply to the particular case at hand, but sometimes it takes more time to find out if and how, than to build an ad-hoc solution.

Incidentally, I am happy to see the recent activity around modern Fortran here and on related sites. At the moment my own needs are mostly in symbolic computation and not exactly high performance, but I follow it all with interest.

Hi @S.G.Janssens, there are some efforts to provide a Fortran wrapper of the symengine code here: cwrapper module [WIP] by ivan-pi · Pull Request #5 · symengine/symengine.f90 · GitHub. Currently, the Fortran routines are only at a level equivalent to the C API. You can find a few more comments about it in an older thread: Code Generation Using Sympy

1 Like

I agree with the title “Research software code is likely to remain a tangled mess”, but probably not for all research codes. I think, it is more or less normal in reasearch and it is the case for my own codes for several reasons:

  • A researcher (or student/postdoc/colleague) adds new functionalities in a code which were not planned from the beginning. Therefore, the code starts to be messy. Of course, you could write a new code more properly, but it takes time (for nothing?). However, with a new code, the data from the previous code will be probably not compatible. Then with your new code, you’ll have the same problem if you want to add a new functionalities. It’s never ending!
    Nevertheless, sometimes, the code starts to be so messy, that you spend more time patching the code than adding new functionalities. Therefore, it’s better to write a new version.

  • As a chemist, I did learn some languages (mainly fortran 77), but I never learn how to program properly (code structure, test, manual …). It is the same for most of my colleagues around the world. Furthermore, most of them don’t want to learn how to program, because it takes too much time (language, code structure, tests, make or cmake, git or others, manual, parallelization, …). fpm could help a lot for some aspects :+1:, if people want to learn how to use it.

Of course, you can hire a software engineer (or try to hire, it is very difficult to justify that in a mainly experimental chemical lab!). Then, the engineer rewrite the code in a right way. So far so good!! But …
Some time the researcher does not understand (or don’t make the effort to understand it) this new version (to complicate for him/her) and therefore he/she is using the old messy code.
Other time, this is working properly and the researcher and the software engineer work as a team.
Anyway, the difficulty about adding new functionalities is still present …

About other comments:

  • “reinvent the wheel”, as han190 or other say, is not that simple. I did it several times!! Some reasons:
    I didn’t know this wheel exist
    I knew it existence, but it was more simple to rewrite the wheel (taking less time, less complex in terms of dependencies …)
    I knew it existence, but by writing the code, I understood more properly how the wheel works. It is particularly interesting for students.
    The existing wheel did not fit properly in the code structure or its functionalities is exactly what you want.

  • " Very occasionally I encounter well organized research software that can be used without having an email conversation with the author". The reason for that has at least two sides: (i) the code can be messy (ii) the science (physics, math …) behind the algorithms can hard to understand and not well understood by a new user in the field.

1 Like

I have experienced Test Driven Development on one of my research code, and it can be very useful when the code is growing. Having automated tests make it easier to refactor code because you are far less afraid of breaking something. So regularly, when it becomes messy (or you need to optimize some parts), you refactor the code, you launch the tests with confidence. And so on.


Yes you are right, I’m using some tests also to check new functionalities or new code version although as automatic as it should be. I’m moving to something more automatic.

As some of the previous comments imply, the quality of research code is a result of the context it is developed in. An interesting article about this:

Why I Write Dirty Code: Code Quality in Context

I also find the Software Engineering guidelines from the DLR (German Aerospace Center) on this topic quite interesting: DLR Software Engineering Guidelines

Essentially, they divide code into four application classes:

  • Application Class 0: For software in this class, the focus is on personal use in conjunction with a small scope. The distribution of the software within and outside DLR is not planned.
    Software corresponding to this application class frequently arises in connection with detailed research problems.
  • Application Class 1: For software of this class, it should be possible, for those not involved in the development, to use it to the extent specified and to continue its development. This is the basic level to be strived for if the software is to be further developed and used beyond personal purposes.
  • Application Class 2: For software in this class, it is intended to ensure long-term development and maintainability. It is the basis for a transition to product status.
  • Application Class 3: For software in this class, it is essential to avoid errors and to reduce risks. This applies in particular to critical software and that with product characteristics.

Thanks @everythingfunctional for that blog article. I knew JOSS (Journal of Open Source Software), where I published about gtk-fortran, but not JOSE (Journal of Open Source Education), which could be very interesting for me. I never tried to publish about teaching, but this journal could be an opportunity.

And I agree that researchers are learners, eternal learners. I don’t know if Learn Fortran - Fortran Programming Language is the place, but it could be interesting to have somewhere a page with articles about research software development, like the Ten Simple Rules cited above. I have a collection of such articles, and learned a lot reading them those ten last years.

Learning Fortran is a good thing, but learning good development practices and methods is also important. Whatever the language, if you have bad programming practices the output will not be optimum! (and soon you will be too scary to modify anything in your dear messy code…)


This is very interesting discussion. For collaborative codes where many people come and go over the years, I believe it’s very important that the code is organized so that the clashes between developers are minimized. In my experience, the best strategy is to have well-written, documented and organized main routines (main, I/O, globals, parallelization) even with certain aspects fixed in a sort of a protocol, while the developers of particular modules should have freedom to organize their own work as they “like” as long as it fits the global picture/plan.

It is a bit like building a telescope. Once the building is constructed, the size of the dome is fixed, the control room is set in a certain place, the main mirrors and the construction are there, one can let various groups to build their own instruments - they should have freedom to optimize them as they want, but still they have to respect the overall blueprint and they should avoid clashes with other groups doing the same. In my experience, researchers joining already developed code often tend to reinvent the blueprint or to see only their particular module without considering other developers. It leads to a complete mess. It is also my impression, from a small sample, that even worse mess may be created by some IT guys assigned to research teams to optimize the code without fully understanding the purpose of it. As Knuth wisely said: “premature optimization is the root of all evil”.

And one more remark. There is a lot of risk in adopting various existing subroutines and, in many cases, what seems to be a shortcut in the end turns into a major restriction. For example, one could get a nice subroutine for various finite difference formulae. However, in a real code, the real troubles often start with boundary conditions that come with a lot of ambiguity and, if developer does not have full control of the FD implementation, there is a big chance that sooner or later s/he’ll have to rewrite this FD module from the scratch.


I notice that on a lot of research projects, those blueprints are missing along with guidelines for contributing and requirements for merging. There is momentum in the right direction, but until project leads value drafting and maintaining those guidelines as much as they value their code, the spaghetti code won’t change. The hope, at the very least, is that spaghetti projects will stand out as expensive relative to others and this will be the incentive for change.

To those of you addressing this problem in your own projects, keep showing the community the way!


I agree with you @vmagnin. From my experience, writing programs in Fortran is very easy. I felt it because fortran has small set of rules which can be mastered easily. Subsequently, programmers feel confident and in control while working with Fortran.

However, many books on Fortran do not talk about the packaging, distribution, and CI/CD. I feel such techniques of project/code management should be included (with reference to modern Fortran) on our fortran-lang website.



Yes, @Niko, I have experienced and done such things. However, it is not very common, because often the desired code is a part of big package. Thus, removing the code from the big library is very difficult due to many reasons. This force me to develop my own code (i.e., reinventing the wheel). To overcome this issue, we should disintegrate the big package as a collection of useful objects, modules, data-types.

Very true, there is a lot of powerful tooling around for Fortran, but documentation is rare or the workflows are difficult to grasp without deep prior knowledge of the toolchains involved. CMake happens to be one of the prime examples for a much relied on yet not well documented tool. Also, conda, usually wrongly perceived as a Python package manager, can make a powerful packaging tool for Fortran projects.

Fortran-lang is a good place to collect those resources or write new ones. I can only again encourage everyone here to have an open eye for interesting Fortran projects and learning material and submit it to the fortran-lang webpage as pull request (GitHub - fortran-lang/ (deprecated) Fortran website).

I’m happy to start joint effort on writing introductions and providing examples/templates to build and package infrastructure for Fortran. We already have a bit of material at fortran-lang:


Why you can ignore reviews of scientific code by commercial software developers:

In summary, most scientific modelling codes are expected to be used by user-developers with extensive internal knowledge of the code, the model, and the assumptions behind it, and who are routinely performing a wide variety of checks for correctness before doing anything with the results. In the right hands, you can have a lot of confidence that sensible, rigorous results are being obtained; however they are not for non-expert users.