Type definition for collecting data

Hello everyone,

I have defined the following type to store the parameters of a single multidimensional Gaussian

type, public :: gaussian_param
    ! mean
    real, dimension(dim) :: x
    ! covariance matrix
    real, dimension(dim, dim) :: C
end type gaussian_param

How best to extend this for the case of a set of Gaussians with different mean and covariance matrices? One approach would be to declare an array of the new type(gaussian_param).
But I would like to instead keep the means and covariances in the same type, along the lines of

type, public :: gaussian_param
    ! mean
    real, dimension(dim, :) :: x
    ! covariance matrix
    real, dimension(dim, dim, :) :: C
end type gaussian_param

where the number of Gaussians is known at runtime

1 Like

You could use a parameterized derived type, for example

module m
implicit none
type, public :: gaussian_param(dim,ngauss)
    integer, len :: dim    ! # of dimensions
    integer, len :: ngauss ! # of Gaussians
    ! mean
    real, dimension(dim, ngauss) :: x
    ! covariance matrix
    real, dimension(dim, dim, ngauss) :: C
end type gaussian_param
end module m
2 Likes

@Beliavsky Can ngauss be given at runtime?
What does it do under the hood? Will it allocate memory on the heap at runtime?

In other words, would it be equivalent to

module m
implicit none
type, public :: gaussian_param
    ! mean
    real, dimension(:,:) :: x
    ! covariance matrix
    real, dimension(:,:,:) :: C
end type gaussian_param
end module m

and then allocate memory at runtime?

1 Like

You can declare the parametrized derived type as allocatable, however I think you can only have all dynamic or all static dimensions for a PDT:

module m
implicit none
type, public :: gaussian_param(dim,ngauss)
    integer, len :: dim    ! # of dimensions
    integer, len :: ngauss ! # of Gaussians
    ! mean
    real, dimension(dim, ngauss) :: x
    ! covariance matrix
    real, dimension(dim, dim, ngauss) :: C
end type gaussian_param
end module m

program app
  use m
  type(gaussian_param(:, :)), allocatable :: param

  allocate(gaussian_param(10, 2) :: param)

  call random_number(param%x)

  print *, param%x
end program
2 Likes

Since covariance matrices are symmetric, if you want to be frugal, you could also keep only the upper (or lower) part of the matrix. It does however make indexing a bit more complex, and might not be compatible with the other parts of your code.

If the number of gaussian is only available at runtime, you will need to use allocatable arrays; so yes, it will be on the heap.

The parameterized derived type (PDT) @Beliavsky showed, could be used like this:

type(gaussian_param(:,:)), allocatable :: gaussians
integer :: d, n

read(*,*) d, n
allocate(gaussian_param(d,n) :: gaussians)

Depending on the compilers you plan to use, you may face problems trying to do something more complex with the PDT. But as long as it’s just a vanilla PDT container, I think it should work fine.

The alternative is to ditch the PDT, and just use allocatable arrays internally:

type :: gaussian_collection
  integer :: d, n
  real, allocatable :: x(:,:)    ! shape [d, n]
  real, allocatable :: c(:,:,:)  ! shape [d, d, n]
end type

In this case you’ll probably want to define a helper subroutine to allocate the collection:

subroutine allocate_gaussian_collection(d,n,collection)
  integer, intent(in) :: d, n
  type(gaussian_collection), intent(out) :: collection
  collection%d = d
  collection%n = n
  allocate(collection%x(d,n), collection%c(d,d,n))
  ! 
  ! ... any other initialization logic
  !
end subroutine
2 Likes

I would not store the dimensions of the allocatable components of a derived type as ordinary components, because they may not be synchronized with the rest of the type. For example, what should d be if size(c,1) /= size(c,2)? Instead, I would write functions to return the dimensions and return negative numbers to signal errors, for example

module m
implicit none
type, public :: gaussian_param
    ! mean
    real, dimension(:,:), allocatable :: x ! (d,n) mean
    real, dimension(:,:,:), allocatable :: C ! (d,d,n) covariance matrix
end type gaussian_param
contains
elemental function ndim(g) result(d)
type(gaussian_param), intent(in) :: g
integer                          :: d
if (.not. allocated(x)) then
   d = -1
else if (.not. allocated(C)) then
   d = -2
else if (size(C,1) /= size(C,2)) then
   d = -3
else if (size(C,1) /= size(x,1)) then
   d = -4
else
   d = size(x,1)
end if
end function ndim
end module m
1 Like

I agree that’s a risk, hence the subroutine for allocation. Any manual fiddling with the values of d or n or the allocation of the arrays should be seen as wrongdoing. I look at it as a variation of the “We are all responsible users” guideline in Python. I should have named the dimension attributes d_ and n_ to highlight this.

In some usage cases, a getter function to access a “private” property is totally fine. The thing I dislike about it in this particular case is you need to consider all the possible edge cases: do I infer the dimensions from x or c? Hmm, I should add a check if they are synchronized; oops, forgot to check if they are allocated first… Suddenly you end up with a complicated getter function with a bunch of branches just to return an array dimension. What next, should I add an extra if branch every time I call ndim() to check it’s positive? Am I sure I implemented the logic correctly and covered all the possible edge cases? No problem, we can write some unit tests; we need one for each branch. By this point, we’ve written probably >50 LOC (remember, there might be bugs), just to retrieve a dimension.

What happens if we add more arrays? What if they have higher rank? The combinatorial growth makes this impractical for derived types with many invariants. In practice, if any of the dimensions were out of sync you’d learn about it sooner than later with a segfault.

The best thing you can do is to perform allocation and initialization in a dedicated “constructor” subroutine. If the type is out of sync at some point, i.e. the derived type invariants have been broken, it can only mean some wrongdoing has occurred.

Observe, that none of these issues exist with the PDT version you suggested initially :slight_smile:. The parameters represent the invariants. The built-in allocate command is used to establish them. Nice and clean, with minimum effort.

We also don’t need a getter function for the number of dimensions:

type :: foo(n)
   integer, len :: n
   real :: a(n)
end type

type(foo(:)), allocatable :: bar
allocate(foo(5) :: bar)

print *, bar%n   ! Allowed

bar%n = 6        ! Error: The assignment to a KIND or LEN component of a 
                 !        parameterized type is not allowed
end

Parameterized derived types are one honking great idea – let’s do more of those!

Unfortunately, I consider the current implementations of PDTs in most (maybe all) recent compilers to be to fragile to “bet the farm” on basing my code on them. This seems to be one of those things that appear to be a great idea that the compiler developers can’t get right.