Summarizing data in stdlib

Beliavsky · March 9, 2021, 1:54pm

The Fortran standard library project has functions to compute statistics. Should subroutines to print summary statistics be added? I have a subroutine print_stats that I have used for many years. Here is a program that demonstrates its use.

   program xstats_table
    ! 03/09/2021 07:10 AM calls print_stats for random variates
    use kind_mod      , only: dp
    use statistics_mod, only: print_stats
    integer, parameter :: n = 1000, ncol = 3
    integer, parameter :: nacf = 2 ! # of autocorrelations to print
    real(kind=dp)      :: xx(n,ncol)
    integer            :: icol
    character (len=4), parameter :: cstats(4) = ["mean","sd  ","min ","max "]
    call random_number(xx)
    xx = xx - 0.5_dp
    forall (icol=1:ncol) xx(:,icol) = xx(:,icol) * 10**(icol-1)
    call print_stats(cstats,xx) ! default usage
    call print_stats(cstats,xx,labels=["v1","v2","v3"],fmt_stat="(a10,1000f10.2)", &
                      fmt_stat_labels="(1000a10)",print_num_obs=.true.,nacf=2,fmt_header="(/,'random stats')", &
                      fmt_trailer="()",print_corr=.true.)
call print_stats(cstats,xx,labels=["v1","v2","v3"],stats_by_rows=.true.,fmt_header=('transposed output')")
    call print_stats(cstats,xx,csv=.true.,fmt_header="()")
    call print_stats(cstats,xx,good=xx>0.0_dp,fmt_header="(/,'stats for xx > 0')")
    end program xstats_table

gives output

 var       mean         sd        min        max
   x1     0.0116     0.2916    -0.4968     0.4989
   x2    -0.0519     2.9453    -4.9985     4.9732
   x3     1.8740    28.2193   -49.9257    49.9813

random stats
#obs: 1000
var      mean        sd       min       max     ACF_1     ACF_2
v1      0.01      0.29     -0.50      0.50      0.02     -0.02
v2     -0.05      2.95     -5.00      4.97      0.03     -0.00
v3      1.87     28.22    -49.93     49.98     -0.02      0.03

CORRELATIONS
              v1          v2          v3
  v1       1.000      -0.032       0.016
  v2      -0.032       1.000       0.052
  v3       0.016       0.052       1.000

transposed output
                  v1         v2         v3
     mean     0.0116    -0.0519     1.8740
       sd     0.2916     2.9453    28.2193
      min    -0.4968    -4.9985   -49.9257
      max     0.4989     4.9732    49.9813

var,mean,sd,min,max
x1,.011569,.291650,-.496765,.498875
x2,-.051871,2.945266,-4.998517,4.973199
x3,1.873969,28.219333,-49.925673,49.981273

stats for xx > 0
         var       mean         sd        min        max
           x1     0.2684     0.1373     0.0003     0.4989
           x2     2.5457     1.3881     0.0178     4.9732
           x3    24.7026    14.3358     0.0030    49.9813

I can’t show the print_stats subroutine since it was written for an employer, but maybe something similar could be added to stdlib. It should work for at least 1D and 2D arrays. In the module that contains print_stats, functions have been defined to compute mean, sd, etc., and there is a SELECT CASE block that processes the string argument to determine what statistic to compute.

Has someone written something similar they can release?

Arjen · March 11, 2021, 8:44am

I like the idea of such a routine, but I have never tried it. In itself it is not all that hard when you already have the statistical computational routines available, but in your example you have quite some options. Detailing these will be the main issue, I think.

Topic		Replies	Views
Interfacing with Fortran from Python using 'iso_c_binding' Help	16	1609	May 6, 2024
I think stdlib should have printf() and fprintf() with C-style formatting Language enhancement	44	1572	December 15, 2023
First release of the Fortran standard library Announcements	49	4481	November 10, 2021
FortranCon 2020 abstract for stdlib Announcements	1	489	June 14, 2020
Is there a Fortran library for pretty-printing tabular data? Help	14	1490	May 13, 2022

Summarizing data in stdlib

Related topics