What goes in stdlib and what is separate?

With fftpack being moved into the fortran-lang GitHub it’s making me think about where the line should be drawn between what is put in stdlib and what is kept as a separate lib. I started this root-finding library with the intent of merging it into stdlib, but now I’m wondering if it should just be separate? Maybe very high-level and domain-specific things should just be separate (optimization, ode solvers, root finding solvers maybe). What do people think?

When FPM becomes ubiquitous, and it becomes trivially easy to specify dependencies and compile everything, having a bunch of smaller libs won’t be the headache it (sometimes) is now. I think it won’t matter where the code is coming from. Very exciting!!

5 Likes

Our goal with fpm is indeed to make it super easy to write a Fortran library and make it friction-less to use it. So indeed, all new functionality should start as a separate library to get usage and nail out the API, performance, etc.

My vision for stdlib: it should contain well designed API and reference implementation for SciPy / Matlab like functionality, plus low level utilities for Fortran such as strings and other stuff. So that stdlib + standard Fortran make Fortran with batteries included, no need to depend on 100+ smaller libraries. This assumes that we can all agree on an API. I believe we can and on things that we cannot, we just keep them separate. I suggest to be very conservative and only include an API that we agree on.
There is a lot of prior art how to design API for basic numerical functionalities: SciPy, Julia, Matlab, R and various libraries in C++ and C.

4 Likes

My “2 cents’ worth”:

  • Put in stdlib if that something should be seen as part of Fortran even if not included currently in the ISO/IEC standard per se. A derived type container for a string type is a good example in my mind for this.
  • Keep as separate library if it makes little sense to be seen as part of base Fortran itself or if there are likely to be multiple incarnations of that something regardless - an ODE solver seems like a good example for this.

For anything else, start as separate first and then depending on Community consensus, decide to merge with stdlib or not.

7 Likes

If you put @certik’s and @FortranFan’s answers together, I’m aligned with that.

I think that deciding what belongs in base Fortran and what doesn’t is difficult and varies a lot with whom you ask. If most (say, 80%) participants can agree on the API and implementation, that may be a good fit for stdlib. But if there can be many flavors, whether API, algorithm, or implementation-wise, indeed a separate library seems a better fit (@FortranFan’s second point).

And consider the positive social aspect of early seeding the fpm-registry with a variety of smaller but high quality libraries. Over time, if any of them should be in stdlib, people will ask for it.

For your root finding and optimization libraries, I think they’re very much in scope. So the question is more, can they be widely accepted or are they opinionated?

1 Like

Let’s look at the Brent’s method: https://github.com/jacobwilliams/opt-lib/blob/0ce2825f87d511bc33764e0aadd3f3061c9c8ca6/src/stdlib_root_core.F90#L1300

The function signature is:

    subroutine brentq(me,ax,bx,fax,fbx,xzero,fzero,iflag)
    class(brentq_solver),intent(inout) :: me
    real(wp),intent(in)  :: ax      !! left endpoint of initial interval
    real(wp),intent(in)  :: bx      !! right endpoint of initial interval
    real(wp),intent(in)  :: fax     !! `f(ax)`
    real(wp),intent(in)  :: fbx     !! `f(ax)`
    real(wp),intent(out) :: xzero   !! abscissa approximating a zero of `f` in the interval `ax`,`bx`
    real(wp),intent(out) :: fzero   !! value of `f` at the root (`f(xzero)`)
    integer,intent(out)  :: iflag   !! status flag (`0`=root found, `-2`=max iterations reached)

Which to me looks very good except the first me argument which is of type:

    type,abstract,public :: root_solver
    !! abstract class for the root solver methods
    private
    procedure(func),pointer :: f => null()  !! user function to find the root of
    real(wp) :: ftol = 0.0_wp       !! absolute tolerance for `f=0`
    real(wp) :: rtol = 1.0e-6_wp    !! relative tol for x
    real(wp) :: atol = 1.0e-12_wp   !! absolute tol for x [not used by all methods]
    integer  :: maxiter = 2000      !! maximum number of iterations [not used by all methods]
    contains
    private
    procedure,public :: initialize => initialize_root_solver !! initialize the class [must be called first]
    procedure,public :: solve !! main routine for finding the root
    procedure(root_f),deferred :: find_root !! root solver function. Meant to be
                                            !! called from [[solve]], which handles some common
                                            !! startup tasks. All these routines assume that
                                            !! \( f(a_x) \) and \( f(b_x) \) have opposite signs.
    procedure :: get_fa_fb
    procedure :: converged
    end type root_solver

    type,extends(root_solver),public :: brentq_solver
    !! brentq root solver
    private
    contains
    private
    procedure,public :: find_root => brentq
    end type brentq_solver

So I personally would not like this API, because one has to first setup the derived type before calling the function. However, we can do both. We can refactor the brent function as:

    subroutine brentq(f,ftol,rtol,atol,maxiter,ax,bx,fax,fbx,xzero,fzero,iflag)
    procedure(func),pointer :: f    !! user function to find the root of
    real(wp), intent(in) :: ftol    !! absolute tolerance for `f=0`
    real(wp), intent(in) :: rtol    !! relative tol for x
    real(wp), intent(in) :: atol    !! absolute tol for x [not used by all methods]
    integer, intent(in)  :: maxiter = 2000      !! maximum number of iterations [not used by all methods]
    real(wp),intent(in)  :: ax      !! left endpoint of initial interval
    real(wp),intent(in)  :: bx      !! right endpoint of initial interval
    real(wp),intent(in)  :: fax     !! `f(ax)`
    real(wp),intent(in)  :: fbx     !! `f(ax)`
    real(wp),intent(out) :: xzero   !! abscissa approximating a zero of `f` in the interval `ax`,`bx`
    real(wp),intent(out) :: fzero   !! value of `f` at the root (`f(xzero)`)
    integer,intent(out)  :: iflag   !! status flag (`0`=root found, `-2`=max iterations reached)

And then this is standalone and it will work. And then we can create a simple wrapper in the brentq_solver class for those who prefer the OO interface.

1 Like

OK so maybe what I did is somewhat opinionated. :slight_smile:

If you look more closely, you will see that brentq is not designed to be called directly. It’s only called though the higher-level class method solve (part of the root_solver class). opt-lib/stdlib_root_core.F90 at 0ce2825f87d511bc33764e0aadd3f3061c9c8ca6 · jacobwilliams/opt-lib · GitHub
(the intent being not to duplicate code that is common to all methods).

What you are looking for is already there: opt-lib/stdlib_root_core.F90 at 0ce2825f87d511bc33764e0aadd3f3061c9c8ca6 · jacobwilliams/opt-lib · GitHub

    subroutine root_scalar(method,fun,ax,bx,xzero,fzero,iflag,&
                           ftol,rtol,atol,maxiter,fax,fbx,bisect_on_failure)

    implicit none

    character(len=*),intent(in)   :: method   !! the method to use
    procedure(func2)              :: fun      !! user function to find the root of
    real(wp),intent(in)           :: ax       !! left endpoint of initial interval
    real(wp),intent(in)           :: bx       !! right endpoint of initial interval
    real(wp),intent(out)          :: xzero    !! abscissa approximating a zero of `f` in the interval `ax`,`bx`
    real(wp),intent(out)          :: fzero    !! value of `f` at the root (`f(xzero)`)
    integer,intent(out)           :: iflag    !! status flag (`-1`=error, `0`=root found, `-999`=invalid method)
    real(wp),intent(in),optional  :: ftol     !! absolute tolerance for `f=0`
    real(wp),intent(in),optional  :: rtol     !! relative tol for x
    real(wp),intent(in),optional  :: atol     !! absolute tol for x
    integer,intent(in),optional   :: maxiter  !! maximum number of iterations
    real(wp),intent(in),optional  :: fax      !! if `f(ax)` is already known, it can be input here
    real(wp),intent(in),optional  :: fbx      !! if `f(ax)` is already known, it can be input here
    logical,intent(in),optional   :: bisect_on_failure  !! if true, then if the specified method fails,
                                                        !! it will be retried using the bisection method.
                                                        !! (default is False). Note that this can use up
                                                        !! to `maxiter` additional function evaluations.

Which is called like so:

call root_scalar('brentq', func, ...)

So, this is the interface you want, I think. But notice that it is simply a wrapper to the object-oriented interface (and not vice versa). To me, the object-oriented interface is the real one, and this one is just a simple wrapper (that has some compromises under the hood since there is a class allocation that is happening, as well as a string comparison and case statement).

My intent is to add a speed test to the unit tests to see what the runtime overhead of this is and we can decide if it is OK or not. Nothing is set in stone right now.

I will also say that this library is a more comprehensive collection of 1d derivative-free bracketing root finding algorithms than I have ever seen anywhere else (including Julia’s Roots.jl, SciPy’s scalar_roots, and GSL’s gsl_root). It’s also way easier to follow the code than any of those libraries (in my humble opinion). This is the kind of thing we need to resurrect Fortran. Modern, clean code, that is the standard and state of the art, and second to no other implementation.

5 Likes

Yes, the root_scalar is exactly the interface I want.

Instead of string comparisons, you can also simply supply an integer and define constants like:

integer, parameter :: method_bisection = 1
integer, parameter :: method_brent = 2
...

So it would be called like:

call root_scalar(method_brent, func, ...)

How it is implemented underneath is secondary to me, so using OO is fine I think. If there is a performance overhead of the OO, then we can rework the internals not to use OO.

In terms of functionality, does the OO API allow anything that the root_scalar API does not?

1 Like

Good point, yep, I can use the integer approach.

Nope, both interfaces have the same capabilities.

1 Like

@certik, if LFortran can leapfrog and try to get enum’s in from Fortran 202X, situations like the one with root_scalar here will be ideal to illustrate the use case of enum’s in programming. The use of integer named constants is not a type-safe way to handle key decision variables when it comes to consuming any API.

I find the Fortran community to not appreciate enum’s as much, something that I would sincerely like to see change soon. This unfortunate reality made me struggle a lot, rather fail mostly, when it came to influencing the standard bearers with a better native enum facility in Fortran 202X.

2 Likes

When I was writing my comment using integers, I expected no less than a message from you @FortranFan. :slight_smile: Yes, it is indeed not type safe in a sense that if you pass in some other set of integers, it would happily compile. After we release MVP of LFortran, hopefully in a month or so, I would be happy to prototype enums.

2 Likes

I agree with @FortranFan.

What I do here and now with the current standard is a rather contrived but decent compromise along these lines:

  type, private :: cat1_cat2_etc
    integer :: cat1    ! = 1000
    integer :: cat2    ! = 1001
    integer :: cat3    ! = 1002
    integer :: cat4    ! = 1003
  end type cat1_cat2_etc

  type(cat1_cat2_etc), parameter, public :: select_categ = &
                                          cat1_cat2_etc(1000, 1001, 1002, 1003)

It still has “magic numbers” but it is somewhat safer.

Getting back on-topic, another (IMO) very useful addition to stdlib from the excellent arsenal of @jacobwilliams repositories, is a CSV handler (stdlib issue and Jacob’s repository) that is a low hanging fruit. Now, with the great work on lists of strings by @Aman it can reach its full potential. And the next step could be the addition of dataframes (example here).

If only I could clone myself… :frowning_face:

2 Likes

Hi.

I would strongly recommend that everything which is supposed to go into stdlib must undergo heavy performance benchmarking against competing implementation in other languages.

As an example, given the @epagone comments I tested the read_csv functionalities from @jacobwilliams. (@jacobwilliams … sorry for the outcome below).

All what I did is adding a call cpu_time around the read command in the “csv_read_test.f90” and compiled everything with gfortran -O3.

Reading a file which on disk has 2.8G on disk with 40Mio records and 20 columns, took 94 seconds and needed 40GB of RAM.

For comparison, R’s fread needed about 9 seconds and 7.8GB RAM.

The full file (80Mio records) was infeasible with FORTRAN but R read it like a breath.

If we sell stdlib with such functionalities it might be just another nail in FORTARN’s coffin, especially when trying to persuade newbies to use the language.

However, the above performance cliff might be just a function of the community size with not enough people reviewing/enhancing/optimizing code.

Cheers

3 Likes

@rcs Regarding the speed of csv reading – I was wondering if you could also report on the performance of read.table in R?

I understand that fread (from the data.table package in R) was written to improve performance over R’s base function read.csv, and that fread is also more limited.

It might be that the fortran read_csv is just making a different flexibility trade-off – or maybe not, but certainly something to consider.

1 Like

Hi.

Unfortunately(for FORTRAN) that is not the case. fread is a versatile as read.table … in addition, you don’t even have to provide header or separator information.

I have looked at the code several times wondering whether I can isolate it from the R environment and interface it to my stuff, or repeat it in FORTRAN. Replicating it would require os-calls via C-interface … again highlighting the limitation of F.

My home-brew csv-parser reads the above data-set in 34sec using 12GB of RAM … much better than read_csv, but still 5x slower than fread. It is not that easy to out-perform a whole group of developers who worked on that package for a decade.

I think stdlib will likely go wrong if it is an assembly of un-reviewed non-optimized code from the shelfs of the community members … then stdlib++ will always be far ahead making C++ the HPC language of the future.

just my 5 cents

BTW, the C code for fread is here: data.table/fread.c at master · Rdatatable/data.table · GitHub

1 Like

Exactly. stdlib it’s barely 1 year and a half old. It is well known that writing fast code is Fortran is relatively easy, but it’s unreasonable to expect to outperform specialised, mature libraries in such extremely short time with such disparity of resources. All routines in stdlib are marked as experimental, so also the user knows what to expect :wink:

I agree that performance benchmarking should be done and that it is important but I disagree that performance should be a show-stopper for the growth and development of stdlib at this point in time. We need to get the ball rolling. A well-designed, much-needed functionality like handling of CSV files, with a nice UI it’s exactly what we need now in stdlib (and Jacob’s implementation delivers exactly that). We can improve later: we’re in experimental mode. “Perfect is the enemy of good”. A practical example of a success story in the same vein is still @jacobwilliamsJSON-Fortran library. It offered since the beginning a very nice and modern UI (which IMO contributed strongly to its success) and then performance was improved when it matured a bit more (obviously, Jacob could better comment on this). I think that’s the way to go.

Speed is not everything (although it’s undeniably one of the current strong points of Fortran by far). However, to my dismay, I often see in the queue of the HPC facilities at my University jobs running in python, matlab or R (not exactly lightning fast languages). This proves IMO that there is an opportunity and we need to nurture a feature-rich (in its area of specialisation) and user-friendly environment for Fortran users because its simple, yet effective characteristics can be the best choice for science and engineering (not only HPC). Compare the (verbose) simplicity of Fortran with the complexity of C++. I applaud all the efforts of the Fortran-Lang community with fpm, stdlib, LFortran, etc… that all go in that direction.

I know that some people in the Fortran community might disagree with the above point to target Fortran also outside HPC applications but I believe that there is a demand for that and I even planned on presenting such thoughts at FortranCon but, unfortunately, I had to shelve the idea due to my current workload.

6 Likes

Could you post it somewhere, even if it is “raw”? My programs read large CSV files, and I’d like to do so more quickly.

1 Like

For the record, I never claimed my csv parser was the fastest thing in the world. Undoubtedly it could be improved.

1 Like