Is zero dependency code valuable?

is zero dependency code valuable?

ChatGPT

Yes, zero dependency code—software that doesn’t rely on external libraries or frameworks—can be highly valuable in several contexts. Here are some of the key advantages and scenarios where zero dependency code is particularly beneficial:

  1. Simplicity and Control: Zero dependency code tends to be simpler in terms of its structure because it relies solely on the standard library provided by the programming language or very minimal external resources. This simplicity often translates into greater control over the behavior of the software, making it easier to understand, debug, and maintain.
  2. Portability: Code with no external dependencies is generally more portable across different systems and environments. Since it doesn’t rely on the presence of specific libraries or versions, it can be easier to deploy in varied environments without worrying about compatibility issues.
  3. Security: Fewer dependencies reduce the attack surface of your software. Every external library can potentially introduce vulnerabilities. By minimizing or eliminating these dependencies, you reduce the risk of security flaws inherent in third-party code.
  4. Stability and Reliability: External dependencies can introduce instability if they are not properly maintained or if they introduce breaking changes in updates. Zero dependency code removes this variable, potentially increasing the overall stability and reliability of your software.
  5. Ease of Deployment: Deployment becomes simpler without the need to manage and configure multiple dependencies. This is particularly advantageous in environments where installation of external packages is restricted or cumbersome.
  6. Long-term Maintenance: Dependencies can become deprecated or no longer maintained. By avoiding external libraries, you sidestep the risks associated with dependency rot, making your codebase more sustainable over the long term.

However, there are also trade-offs to consider:

  • Development Time and Effort: Building solutions without external libraries might require more time and effort, as you may need to implement functionality that is readily available in third-party libraries.
  • Feature Richness: External libraries often offer a richness of features that would be time-consuming and complex to replicate from scratch. Forgoing these can mean reinventing the wheel for common tasks, which can be inefficient.
  • Performance: In some cases, third-party libraries are highly optimized for performance in ways that might be difficult to achieve with zero dependency code, especially for complex operations like data processing, mathematical computations, or graphics rendering.

Overall, whether zero dependency code is the best approach depends on the specific requirements of your project, including factors like the need for security, stability, portability, and the availability of development resources. For certain applications, particularly those in constrained or highly secure environments, zero dependency code can be particularly valuable.

3 Likes

I learned the value of “zero dependencies” early in graduate school. I learned it by watching others and myself suffer daily from the dependency hell, particularly in C++ libraries. Since then, I have taken pride in listing “zero dependencies” in our works. For me, the implicit rule is that if a dependency must exist, its installation and version consistency must be fully automated and independent of the end user. But it is sometimes very difficult to uphold this rule in a portable way, and the dependency hell does not seem to be as infernal as it used to be a decade ago.

3 Likes

I tend to avoid dependencies in personal code. Code I use at work is a different story as whether to use a dependency is often not my decision. ChatGPT lists some decent reasons to avoid dependencies, but misses two important reasons for me:

  • Dependencies might be buggier than my own code.
  • Writing my own libraries helps me learn a subject.

My first bullet may surprise some people. I’ve often heard that popular scientific software libraries are tested by more people and therefore have fewer bugs, a version of Linus’s law. I think Linus’s law is directionally correct, but weak. Rather than assuming that because something is popular, it must have fewer bugs, I think it’s better to examine the library and its tests directly. Many test suites seem incomplete to me. And unfortunately, debugging other people’s code can be a real pain. It just feels easier to me for many things to write my own code and tests if I really care about its correctness.

I’m not the only one who feels this way. Forman Acton wrote the following in Real Computing Made Real (pp. 65-66):

The longer I have computed, the less I seem to use Numerical Software Packages. In an ideal world this would be crazy; maybe it is even a little bit crazy today. But I’ve been bitten too often by bugs in those Packages. For me, it is simply too frustrating to be sidetracked while solving my own problem by the need to debug somebody else’s software. So, except for linear algebra packages, I usually roll my own. It’s inefficient, I suppose, but my nerves are calmer.

The most troubling aspect of using Numerical Software Packages, however, is not their occasional goofs, but rather the way the packages inevitably hide deficiencies in a problem’s formulation. We can dump a set of equations into a solver and it will usually give back a solution without complaint - even if the equations are quite poorly conditioned or have an unsuspected singularity that is distorting the answers from physical reality. Or it may give us an alternative solution that we failed to anticipate. The package helps us ignore these possibilities - or even to detect their occurrence if the execution is buried inside a larger program. Given our capacity for error-blindness, software that actually hides our errors from us is a questionable form of progress.

And if we do detect suspicious behavior, we really can’t dig into the package to find our troubles. We will simply have to reprogram the problem ourselves. We would have been better off doing so from the beginning - with a good chance that the immersion into the problem’s reality would have dispelled the logical confusions before ever getting to the machine.

I also appreciate that writing my own code often dramatically improves my understanding of how an algorithm works.

Of course, as ChatGPT suggests, whether to use a dependency or not depends on various factors. I do not hesitate to use libraries which are popular, well-tested, and difficult to develop myself. Many database engines meet that criteria, for instance.

1 Like

@btrettel ,

Your argument comes across more as case for many revel in their own flatulence while detesting others’.

Developing a more equitable and impartial outlook on “bugs” and fostering more of the functionality in standard (c.f. Fortran stdlib effort) and specific library(ies) advances your domain better you’ll find in the long run.

Your point could have been made far more charitably.

My point is not that my own code is of uniquely high quality. My point is essentially that the quality of my own code can be more clear to me than the quality of external code developed by someone unknown, particularly if the external code has no tests. I’m not the best writer, so perhaps this didn’t come across clearly.

As for why I don’t participate in open source libraries, that largely comes down to a lack of time. Figuring out an external library’s code and tests is time consuming. While I agree that contributing to open source software would be nice, and I have contributed when I can (mostly in bug reports), I simply do not have the time to do more than I already do.

5 Likes

Last year I had to revive a code of mine that had been unused and unmaintained for 20 years. Strictly speaking It was not a “zero-dependency” code, but the dependencies were limited to BLAS/LAPACK, which are stable and widely established libraries. Everything else was embedded in the source code, including some libraries that I just copied at that time because I was difficult to link with them for some reason.

It took me only half a day to have a working executable ready for new tests, I hadn’t to modify a single line of code, everything was working like 20 years ago. I just had to update the building scripts and Makefile, and relearn how to use the code.

If this code had relied on external libraries other than well established ones, I’m pretty sure it wouldn’t have been that simple…

3 Likes

I cannot agree more. I was once tasked with installing and maintaining the Sandia SIERRA computational mechanics codes on a DoD HPC system. It was all C++. Installing the base code wasn’t hard. It was the multiple 3rd party libraries with hard wired dependencies on particular versions of system libraries or CMake that were the issue. I will never understand why developers who want other people to use their code make it almost impossible to do by restricting their codes to certain versions of CMake or various shared object libraries on Linux/Unix

1 Like

I think the large library ecosystem and easy installation via package managers is what makes languages like Python, Rust, and Julia so attractive for many users. Writing applications from scratch is often not an option because the lack of time.

I’m member of the DAMASK (https://damask.mpie.de) team that decided to rely on the following libraries for the Fortran code

In addition, the Python tools for pre- and post-processing need

  • pandas
  • numpy
  • scipy
  • vtk
  • matplotlib
  • pyyaml

All libraries are available as system packages on big Linux distributions and on spack and conda forge. They are more or less finished (FFTW, zlib) or have an active team that offers support (HDF5 and especially PETSc).
As an open source code developed in academia, resources are scarce and developing the functionality provided by the library is impossible for us. So the question was not whether we want to have a “zero dependency” code but whether we have a code with relevant features at all.
DAMASK is available as native system packages for Ubuntu, Debian, Fedora, AUR Arch Linux, and on Conda Forge and via spack. So installation for users is super easy because we manage the dependencies via package managers. The drawback is that we need to spend time building the packages.

There is one more dependency which is currently optional: libfyaml (GitHub - pantoniou/libfyaml: Fully feature complete YAML parser and emitter, supporting the latest YAML spec and passing the full YAML testsuite.). We wrote a fairly working YAML parser in Fortran, but since YAML is really complicated, it is hard to write a complete parser. Actually, only libfyaml comes really close to passing the testsuite (https://matrix.yaml.info/). So if libyaml is installed, we link it and use it to preprocess YAML files into a canonical form that our Fortran parser can handle. One advantage of libfyaml is that it is written in C, so an installed version can be linked independently of the compiler used for the main application.

In summary, I would say that relying on external libraries for auxiliary tasks makes sense. I develop software in materials science, which is my field of expertise. In contrast, my knowledge in linear algebra, data storage, fast Fourier transforms, plotting, string parsing, 3D graphics, and file compression is rather limited and I’m happy that there are other people that provide high quality libraries for that.
But I would also say that one has to keep the list of dependencies as small as possible and one should not rely on external code that is not maintained.

Edit: We also rely on CMake for building and an MPI implementation.

4 Likes

I had a similar experience. A former employer approached me to resurrect an old modelling suite. Similarly, the dependencies were limited to BLAS/LAPACK and a few special functions (and we had embedded the source code). Simple command line interface (so no GUI libraries) and all output was simply structured text files. The stand-alone computational core “just worked” and passed the comprehensive testsuite with minimal effort using both gfortran and ifort.

None of the original graphical post-processing worked. I knocked-up some gnuplot scripts as a proof of concept, and left the final implementation as an exercise for the reader.

In my consultancy work, I have always relied on a “zero dependency code” approach.
If you rely on a 3rd party solution, the worst outcome is when the 3rd party solution does not apply to your particular problem. You are then left waiting until the solution supplier addresses the issue, which is when they schedule work to check their problem. These uncontrollable delays can destroy your budget and reputation as a solution provider.
If you reasearch and solve similar problems, you will achieve a better outcome.

Plus shouldn’t we strive to understand how the solution works ?
For years, I have been trying to understand why I can’t write as efficient a linear equation solver using Fortran, as other commercial packages. It has been a rewarding learning experience.

( I am presently achieving over 100 GFlops on my $4k pc, but if I consider multiply and add (FMA) as two operations then it is 200 ! I suspect this approach is used for some packages, so you have to be careful about quoted performance. Never the less, teraflops are still very elusive for my direct solvers so there is still much more to understand at my price point.)

2 Likes

I’ve always maintained my own little library of Linear Algebra routines that I compile as an integral part of the project. I rely on them for code development so I don’t have to deal with LAPACK/SCALAPACK etc when I’m writing the code. If you take a little time to make the interfaces generic, replacing my code with say LAPACK becomes just a few lines of code and some pre-processor magic. Plus I have results from my code to test against what LAPACK gives me. Also, as you say, using my own code does help to understand how the solution works and if something goes wrong I know its probably something I did and not a bug in the 3rd party lib. If I’m going to debug code, I want to limit that process to just what I’ve written and not someone elses code or something evil the compiler did on its own (something I encounter more of these days than Way Back When)

I would like to add “flexibility”: Imagine your project has grown in the course of time and shifted to different applications than originally anticipated. A paradigm shift can require refactoring of backbone structures in your code. When the code is all yours, you can easily reform data structures in a fundamental way.

Sure, and these large ecosystems are great for prototyping, for quickly developing something that is not meant to last years or decades, etc… The problem arises when it comes to build industrial applications with a long lifetime. I’m really worried when I see people using the same strategy and relying on “obscure” libraries (or sometimes not so obscure but trendy) quickly written by who-knows-who without reasonable guarantee that the library will be maintained and available on the long term, and that who-knows-who won’t break everything at some point (and yes it happens).

3 Likes

The problem arises when it comes to build industrial applications with a long lifetime. I’m really worried when I see people using the same strategy and relying on “obscure” libraries (or sometimes not so obscure but trendy) quickly written by who-knows-who without reasonable guarantee that the library will be maintained and available on the long term, and that who-knows-who won’t break everything at some point (and yes it happens).

I totally agree, pulling in dependencies without realizing that this comes at a cost is insane. But for me, there is a big difference between “obscure” libraries and long-running and established projects like NumPy, PETSc, FFTW and the like. I can write leftpad (NPM & left-pad: Have We Forgotten How To Program? -), but I can’t write a competitive fast Fourier transform or even an MPI library.

Even when using established libraries, things break because the maintainers adopt API’s. That requires constant but often small time investments on my side. The most important thing is to do maintenance regularly and adopt quickly to new versions because it’s easier to fix a few things according to the most recent changelog then waiting until the developers of the external library have forgotten the details of a particular code modification.

Back when I was seriously paranoid about things like this, we used to build the libraries we used from source as a contingency. We would use the optimized vendor versions for production, but we would also TEST against our own builds too. Often this was trivial - just point to different shared libraries at runtime. If you are extra paranoid you can test different levels of library optimization.

Then tuck the source to the libraries somewhere safe and wait until something changes. Building LAPACK and running the testsuite certainly exercises your compiler and system. I got quite proficient in reducing LAPACK testsuite failures to gfortran bug reports.

19551 – [3.4/4.0 Regression] pure (complex types) function call removed as dead (LAPACK routine claic1.f bug) brings back memories

I write two types of computer programs:

(1) Lots of tiny non-commercial programs written purely for fun or for experimentation with some programming language. These applications typically do not depend on external libraries because of their simplicity.

(2) Fewer programs which I take seriously. As they must be maintainable until my death or longer, zero dependency is a must. And as I care about them and love them, I want every single line to be written by me personally–this is the only way I can fix any bug.

I’ve always been amazed when reading people’s Rust code, as people tend to rely on external libraries for such simple things as colorizing font in terminal output or parsing arguments, despite doing it from scratch on many occasions seeming simpler and more straightforward to me. And there are multiple libraries available out there for such tasks!

2 Likes

As long as the source code is available, you can fix a bug, even if you are not the original author. I have often used the statistics codes of Alan Miller, as did others. When he was alive he did respond to bug reports. Today the code in the Fortran stdlib is reviewed before acceptance. Code that is public and widely used is more likely to be debugged. Maybe your rule could be amended to “don’t use code that you don’t understand”.

1 Like

Probably. But this also matters!

It goes both ways.

A friend of mine worked for a company which made a hardware device that was deployed world wide. He wrote his own OS, C compiler, and all application code - including remote diagnostics for it himself. An extreme example, but it happens.

From a more practical sense, an argument for conditional compilation schemes is that you can have a built-in ‘reference’ implementation of an external library. For example in ESMF, there is a ‘mpiuni’ module which contains a single processor stub subset of the MPI library. To get full multi-processor code, one sets environment variables to point to a ‘real’ MPI library and mpiuni is not compiled as part of the build. We did similar with pieces of other libraries where we just included the public domain version of the code, with a build option to use more optimized versions. In other cases, like for NetCDF I/O or JSON output, if one doesn’t enable the capability and point to the library, the feature simply doesn’t work.

Funny thing is that I wrote a pre-processor to do conditional compilation back in the 1970s. I used it to isolate machine dependencies and provide different versions of machine dependent code for each machine/OS we supported. When I arrived, that same code made great use of the Tektronix PLOT-10 library for doing graphics on Tek 4010-series scopes. Minicomputers of the era had pretty small memories (we were developing on a Data General Eclipse), and we were only using a subset of PLOT-10s capabilities. And anyone using our code had to obtain PLOT-10 for their environment. So the lead developer wrote his own subset library with just the capabilities we needed, and eliminated the dependency. Of course I then had to figure out how to do transparent TTY I/O on all the machine/OS that we supported - rather than let PLOT-10 deal with it. Easier said than done on some of them…

It is a shame that co-co was never supported by any compiler vendor. But most folks now seem to just use a Fortran-aware variant of the C pre-processor. Or fypp - which I’ve started using a lot lately.

1 Like