Type definition for collecting data

MichaelRedenti · May 26, 2022, 6:31pm

Hello everyone,

I have defined the following type to store the parameters of a single multidimensional Gaussian

type, public :: gaussian_param
    ! mean
    real, dimension(dim) :: x
    ! covariance matrix
    real, dimension(dim, dim) :: C
end type gaussian_param

How best to extend this for the case of a set of Gaussians with different mean and covariance matrices? One approach would be to declare an array of the new type(gaussian_param).
But I would like to instead keep the means and covariances in the same type, along the lines of

type, public :: gaussian_param
    ! mean
    real, dimension(dim, :) :: x
    ! covariance matrix
    real, dimension(dim, dim, :) :: C
end type gaussian_param

where the number of Gaussians is known at runtime

Beliavsky · May 26, 2022, 6:44pm

You could use a parameterized derived type, for example

module m
implicit none
type, public :: gaussian_param(dim,ngauss)
    integer, len :: dim    ! # of dimensions
    integer, len :: ngauss ! # of Gaussians
    ! mean
    real, dimension(dim, ngauss) :: x
    ! covariance matrix
    real, dimension(dim, dim, ngauss) :: C
end type gaussian_param
end module m

MichaelRedenti · May 26, 2022, 7:04pm

@Beliavsky Can ngauss be given at runtime?
What does it do under the hood? Will it allocate memory on the heap at runtime?

In other words, would it be equivalent to

module m
implicit none
type, public :: gaussian_param
    ! mean
    real, dimension(:,:) :: x
    ! covariance matrix
    real, dimension(:,:,:) :: C
end type gaussian_param
end module m

and then allocate memory at runtime?

awvwgk · May 26, 2022, 7:12pm

You can declare the parametrized derived type as allocatable, however I think you can only have all dynamic or all static dimensions for a PDT:

module m
implicit none
type, public :: gaussian_param(dim,ngauss)
    integer, len :: dim    ! # of dimensions
    integer, len :: ngauss ! # of Gaussians
    ! mean
    real, dimension(dim, ngauss) :: x
    ! covariance matrix
    real, dimension(dim, dim, ngauss) :: C
end type gaussian_param
end module m

program app
  use m
  type(gaussian_param(:, :)), allocatable :: param

  allocate(gaussian_param(10, 2) :: param)

  call random_number(param%x)

  print *, param%x
end program

ivanpribec · May 26, 2022, 7:19pm

Since covariance matrices are symmetric, if you want to be frugal, you could also keep only the upper (or lower) part of the matrix. It does however make indexing a bit more complex, and might not be compatible with the other parts of your code.

If the number of gaussian is only available at runtime, you will need to use allocatable arrays; so yes, it will be on the heap.

The parameterized derived type (PDT) @Beliavsky showed, could be used like this:

type(gaussian_param(:,:)), allocatable :: gaussians
integer :: d, n

read(*,*) d, n
allocate(gaussian_param(d,n) :: gaussians)

Depending on the compilers you plan to use, you may face problems trying to do something more complex with the PDT. But as long as it’s just a vanilla PDT container, I think it should work fine.

The alternative is to ditch the PDT, and just use allocatable arrays internally:

type :: gaussian_collection
  integer :: d, n
  real, allocatable :: x(:,:)    ! shape [d, n]
  real, allocatable :: c(:,:,:)  ! shape [d, d, n]
end type

In this case you’ll probably want to define a helper subroutine to allocate the collection:

subroutine allocate_gaussian_collection(d,n,collection)
  integer, intent(in) :: d, n
  type(gaussian_collection), intent(out) :: collection
  collection%d = d
  collection%n = n
  allocate(collection%x(d,n), collection%c(d,d,n))
  ! 
  ! ... any other initialization logic
  !
end subroutine

Beliavsky · May 26, 2022, 8:21pm

ivanpribec:

The alternative is to ditch the PDT, and just use allocatable arrays internally:
type :: gaussian_collection
  integer :: d, n
  real, allocatable :: x(:,:)    ! shape [d, n]
  real, allocatable :: c(:,:,:)  ! shape [d, d, n]
end type

I would not store the dimensions of the allocatable components of a derived type as ordinary components, because they may not be synchronized with the rest of the type. For example, what should d be if size(c,1) /= size(c,2)? Instead, I would write functions to return the dimensions and return negative numbers to signal errors, for example

module m
implicit none
type, public :: gaussian_param
    ! mean
    real, dimension(:,:), allocatable :: x ! (d,n) mean
    real, dimension(:,:,:), allocatable :: C ! (d,d,n) covariance matrix
end type gaussian_param
contains
elemental function ndim(g) result(d)
type(gaussian_param), intent(in) :: g
integer                          :: d
if (.not. allocated(x)) then
   d = -1
else if (.not. allocated(C)) then
   d = -2
else if (size(C,1) /= size(C,2)) then
   d = -3
else if (size(C,1) /= size(x,1)) then
   d = -4
else
   d = size(x,1)
end if
end function ndim
end module m

ivanpribec · May 26, 2022, 10:30pm

I agree that’s a risk, hence the subroutine for allocation. Any manual fiddling with the values of d or n or the allocation of the arrays should be seen as wrongdoing. I look at it as a variation of the “We are all responsible users” guideline in Python. I should have named the dimension attributes d_ and n_ to highlight this.

In some usage cases, a getter function to access a “private” property is totally fine. The thing I dislike about it in this particular case is you need to consider all the possible edge cases: do I infer the dimensions from x or c? Hmm, I should add a check if they are synchronized; oops, forgot to check if they are allocated first… Suddenly you end up with a complicated getter function with a bunch of branches just to return an array dimension. What next, should I add an extra if branch every time I call ndim() to check it’s positive? Am I sure I implemented the logic correctly and covered all the possible edge cases? No problem, we can write some unit tests; we need one for each branch. By this point, we’ve written probably >50 LOC (remember, there might be bugs), just to retrieve a dimension.

What happens if we add more arrays? What if they have higher rank? The combinatorial growth makes this impractical for derived types with many invariants. In practice, if any of the dimensions were out of sync you’d learn about it sooner than later with a segfault.

The best thing you can do is to perform allocation and initialization in a dedicated “constructor” subroutine. If the type is out of sync at some point, i.e. the derived type invariants have been broken, it can only mean some wrongdoing has occurred.

Observe, that none of these issues exist with the PDT version you suggested initially . The parameters represent the invariants. The built-in allocate command is used to establish them. Nice and clean, with minimum effort.

We also don’t need a getter function for the number of dimensions:

type :: foo(n)
   integer, len :: n
   real :: a(n)
end type

type(foo(:)), allocatable :: bar
allocate(foo(5) :: bar)

print *, bar%n   ! Allowed

bar%n = 6        ! Error: The assignment to a KIND or LEN component of a 
                 !        parameterized type is not allowed
end

Parameterized derived types are one honking great idea – let’s do more of those!

rwmsu · May 26, 2022, 11:34pm

Unfortunately, I consider the current implementations of PDTs in most (maybe all) recent compilers to be to fragile to “bet the farm” on basing my code on them. This seems to be one of those things that appear to be a great idea that the compiler developers can’t get right.

Topic		Replies	Views
Working with parameterised derived type containing array of parameterised derived type	35	1615	May 29, 2023
Interesting output from gfortran when using parameterized derived types	10	1960	April 1, 2021
Newbie question on parameterized-declaration-list Help	1	351	January 8, 2023
A gfortran issue with parameterized derived types Help	20	4103	July 12, 2024
Parameterized derived types Help	4	485	April 19, 2022

Type definition for collecting data

Related topics