Array broadcasting in expressions

I have placed a similar issue in fortran proposal (Array broadcasting in expressions · Issue #252 · j3-fortran/fortran_proposals · GitHub) before reading that it is better to place it here in advance to have more comments.

Array broadcasting is very common in Python with numpy and in Matlab.

Broadcasting means that in an array expression if an array has a unitary dimension along an axis that dimension is automatically expanded to match the size of the other array. For example if an array has size [3,1] and another has size [1,4] both array are expanded as they have size [3,4]. In the actual Fortran one has to use spread to achieve the same result.
Another nice feature of numpy is the possibility to add unitary dimensions on the fly with None . Python example:

a = np.ones(2)
b= np.ones(3)
a[:,None] * b[None,:]

Moreover some functions that reduce the dimensionality like sum can have an extra arguments keepdims that doesn’t reduce the dimensionality but make that dimension unitary. This is useful as one can write (in Python) somenthing like:
a/np.mean(a, axis=0,keepdims=True)
So I suggest the following feature:
Use the symbol “+” (or whatever other symbol) to add a dimension to a section of an array. The function sum and other that reduce the dimensionality should have an extra keyword argument like keepdims that doesn’t eliminate that dimension.
As an example:

integer :: a(3), b(4), c(3,4), d(4), i
a = [(i=1,3)]
b = [(i=1,4)]
c = a(:,+) + b(+,:)
d = c/sum(c, dim=1, keepdim=.true.)

Contrary what happen to Matlab and Python adding the extra dimension is obligatory (no implicit unary dimensions).
The point is that you may not know, in general, if an expression will involve broadcasting at compile time (except for those simple examples), so at run time the processor has to check the extent of all the arrays involved in the array expression, and that may affect the speed. On the other hand it may be safer to require the processor to report if in array expressions the dimensions are not compatible. It may make the program a little slower but safer (with an explicit loop the burden should be all on the shoulder of the programmer).

Or one can decide to make things more explicit adding an attribute to arrays, like broadcastable which indicates that any unitary dimension of that array can be expanded.
A section with an added dimension with “+” like in the previous example will be broadcastable , and the return of a function like the sum with keepdim will be broadcastable (or one uses a different function like sum_keepdim, which returns a broadcastable array, while the old one return a normal array).
All other array expressions can remain as before, without the overhead to check the extent of all the involved arrays.

integer, broadcastable :: a(3,1), b(1,4)
integer :: c(3,4) 
integer :: dd(3,1), ee(1,4)

c = a + b   !  broadcasting

!  c =  dd + ee  ! illegal in current and future Fortran

check out Fortran intrinsic spread,

http://www.lahey.com/docs/lfpro78help/F95ARSPREADFn.htm

https://www.cenapad.unicamp.br/parque/manuais/Xlf/lr366.HTM

1 Like

broadcasting in NumPy does not necessarily increase the memory footprint.

np.eye(3) + np.random(5,3,3) adds random fluctuations to the 3x3 identity matrix where copies of np.eye(3) are avoided.
The equivalent statement in Fortran would be
a = spread(eye(3),3,5) + fluct for fluct of shape (3,3,5) which would most likely mean that copies are created.
Of course, a loop over the 5 entries would avoid copies, but IMHO at the cost of readability.

For me, array broadcasting semantics is one of the hardest things to get right when using NumPy; it is a constant source of errors in my own codes. In little scripts where performance is not critical, I always opt for the explicit loop. As an exception, I do like NumPy’s a[:,None] or a[:,newaxis] for simple broadcasting expressions involving only 1- or 2-d arrays.

In Fortran, I find it freeing to just write the loop without sweating the performance. More often than not, the loop body closely resembles the mathematical expression I’m implementing anyway. I also choose not to use merge, spread, pack, or unpack for similar reasons. I tend to find loops to be easier to reason about than expressions involving these array functions.

Rather than introduce a new concept to the language, I wonder if there are a handful of common broadcasting expressions which, implemented as utility functions, would cover most needs. For instance, the proposed Kronecker sum expression c = a(:,+) + b(+,:) could be c = kron_sum(a, b).

5 Likes

I know the function spread, it’s like the repmat function in Matlab.
They resisted to introduce array broadcasting, even if one of its clone, Octave has already introduced it, but at the end they ceded and introduced array broadcasting.

Array Broadcasting is present in Matlab, Python, and Julia.
You can add additional dimension in Python with “None” or “numpy.neaxis”, and in Julia with “[CartesianIdex()]”.

And no, it doesn’t increase the memory footprint as the arrays are not expanded (you don’t need). For example the numpy function broadcast_arrays create broadcasted views of the original arrays.

>>> a = np.array([[1],[2],[3],[4]])
>>> a.shape
  (4, 1)
>>> b = np.array([5,6,7])
>>> b.shape
 (3,)
>>> aa, bb = np.broadcast_arrays(a,b)
>>> cc = aa + bb

The shapes of the arrays is the same for all of them

>>> aa.shape
 (4, 3)
>>> bb.shape
 (4, 3)
>>> cc.shape
 (4, 3)

But the strides are different as cc is a regular matrix, while aa and bb are just views (note the zero in the strides)

>>> cc.strides
 (24, 8)
>>> aa.strides
 (8, 0)
>>> bb.strides
 (0, 8)

I don’t think it would be a good idea to have broadcasted views directly accessible in Fortran as they are not definable, but I suppose that a compiler could use them inside an expression if no other way to speed up the calculation is present. The function spread in theory will create a copy of the original array even though I suppose that compilers will try to convert it in something like a broadcasted view or use some other more intelligent trick.

1 Like

one option is you don’t like broadcasting is Einstein notation. it’s slightly more expensive, and possible to generate fast code for (for example, see tullio+LoopVectorization)

2 Likes

Some links to Einstein notation for arrays:

1 Like

@nshaffer as a physicist, I always found the Einstein explicit tensor notation much easier to understand what is going on and what is contracted with what than other notations such as differential forms. But I know other people who find differential forms much nicer. In the same way, I always prefer explicit indexing into multidimensional arrays and do loops and just like you, avoid merge, spread, pack. I avoid the broadcasting in NumPy also. And I think there are probably others who find the “index free” array notation nicer.

1 Like

I think merge and pack are very useful. It’s easier to read and write

y = merge(0.0,1.0,condition)

than

if (condition) then
   y = 0.0
else
   y = 1.0
end if

With pack you can get results similar to those of data base queries. Here is an example.

module m
implicit none
type :: date_t
   integer :: year, month, day
end type date_t
end module m
!
program main
use m
implicit none
integer, parameter :: n = 1000
real :: x(n)
type(date_t) :: dates(n)
! read dates(n) and x(n)
print*,pack(x,dates%year==2021)  ! get 2021 values
print*,pack(x,dates%month==2)    ! get February values
print*,pack(x,dates%day==1)      ! get values for first day of month
print*,pack(x,dates%year==2021 .and. dates%month==3) ! get values for March 2021
end program main
2 Likes

I am glad you gave this example. For me it is 100% easier to read and write the if statement. But as I said above, I know opinions on this differ.

6 Likes

But if y were an array of several million members, it would be quite a waste of resources in case condition is false

probably compilers are smart enough to detect this and avoid two subsequent assignments.

As already discussed, in Fortran it is only a matter of taste. I prefer the short, math-like notation and would use broadcasting.
In Python/NumPy one has no choice: Loops are terribly slow, for reasonable performance one needs to use broadcasting and np.einsum.

We have an issue open for einsum in stdlib: Einstein summation notation · Issue #376 · fortran-lang/stdlib · GitHub

Personally, I could never really wrap my head around broadcasting in Python. The use of None or np.newaxis has always felt a bit odd to me. I’ve used the implicit expansion behavior in MATLAB on a few occasions. It does lead to very succinct code and makes some operations very easy. However, I’m always worried if I’ll be able to recognize it months or years later when I get back to the code. Kind of like implicit allocation upon assignment in Fortran.

I’ve found the following two blog posts quite enlightening:

1 Like

There is an old, not so used programming language called yorick, which allowed a sort of index summation: Yorick: Matrix Multiply

And, of course, broadcastibility and insertion of unitary dimensions.

Just for fun I wrote a small C routine that uses ISO_Fortran_binding that sums array broadcasting their unitary dimension, not very efficient, but it didn’t allocate any additional memory.
It uses internally descriptors with zero stride views.

#include <stdio.h>
#include <string.h>
#include "ISO_Fortran_binding.h"

int linear_to_c_index(CFI_dim_t dim[], int rank, CFI_index_t index[], size_t i_seq) {
    for (int i = 0; i < rank; i++) {
        index[i] = i_seq % dim[i].extent + dim[i].lower_bound;
        i_seq /= dim[i].extent;
    }
    return i_seq;
}

size_t fortran_size(CFI_cdesc_t *v) {
    size_t n_element = 1;
    for (int i = 0; i < v->rank; i++)   n_element *= v->dim[i].extent;
    return n_element;
}

void broadcast_dim(CFI_cdesc_t *v, CFI_cdesc_t *out, CFI_dim_t dim[]) {
    for (int i = 0; i < out->rank; i++) {
        if (v->dim[i].extent == out->dim[i].extent) {
            dim[i].extent = v->dim[i].extent;
            dim[i].sm = v->dim[i].sm;
            dim[i].lower_bound = v->dim[i].lower_bound;

            continue;
        }

        // no error check !!!
        dim[i].extent = out->dim[i].extent;
        dim[i].sm = 0;
        dim[i].lower_bound = 0; // not really used
    }
}

void *address(void *base, int rank, CFI_dim_t dim[], CFI_index_t subscripts[])
{
    char *elem_address = (char *)base;
    for (int i = 0; i < rank; i++)
        elem_address = elem_address + dim[i].sm * subscripts[i];

    return (void *)elem_address;
}

void broadcast_add(CFI_cdesc_t *left, CFI_cdesc_t *right, CFI_cdesc_t *out) {
    // no checks at all!!!

    int rank = out->rank;

    CFI_dim_t dim_left[CFI_MAX_RANK] = {0};
    CFI_dim_t dim_right[CFI_MAX_RANK] = {0};

    // broadcast array to zero stride views
    broadcast_dim(left, out, dim_left);
    broadcast_dim(right, out, dim_right);
    size_t n_element = fortran_size(out);

    CFI_index_t ind_out[CFI_MAX_RANK] = {0};

    for (size_t i = 0; i < n_element; i++)   {
        linear_to_c_index(out->dim, rank, ind_out, i);
        void *ptr_out = CFI_address(out, ind_out);
        void *ptr_left = address(left->base_addr, rank, dim_left, ind_out);
        void *ptr_right = address(right->base_addr, rank, dim_right, ind_out);

        *(int *)ptr_out = *(int *)ptr_left + *(int *)ptr_right;
    }
}

with a Fortran interface like

interface
   subroutine broadcast_add(left, right, out) bind(C)
        use iso_c_binding
        implicit none
        integer(c_int), intent(in) :: left(..), right(..)
        integer(c_int), intent(out) :: out(..)
    end subroutine
end interface
1 Like

There are some points:

  • Would it be worth to have this feature in Fortran, as it wouldn’t break any current rule and it will complete the array expression facility implementing something that is already implemented in other languages? Of course it is always demanded to each programmer to use the clearer syntax in any specific problem (even though what is clear to one may not be so clear to somebody else).

  • Is it enough easy to be implemented in the compilers?

  • Could there be other more complete features that would allow this as a by-product?

Well, I’m saying that it wouldn’t make a conforming program into a non conforming one.

Every language enhancement may make a non conforming program into a conforming one, I would say, by definition.

The problem is the first one, for example, when one deletes a feature. Or if an added feature prevents a more useful feature later on.

I am wary of changing the language as suggested because of the increase in complexity and because some Fortranners, including me, would use the feature unintentionally when we would prefer to get a compiler error message about nonconforming arrays. Maybe a .plus. operator can be defined by the user so that one can write

z = x .plus. y

where x, y, and z must have the same rank and where a dimension of x that is 1 is broadcast to that of y, and a dimension of y that is 1 is broadcast to that of x. Then I think the Kronecker sum could be written

z = reshape(x,[size(x),1]) .plus. reshape(y,[1,size(y)])

1 Like

When a scalar x is added to a 1D array y, the result is the same as if size(y) copies of x are added to y. R generalizes this by allowing two vectors of unequal length to be added by recycling the shorter vector, for example

1:4 + 1:2
[1] 2 4 4 6

A user could define an operator .addrecycle. to get such behavior in Fortran. Calling it just .add. would create confusion.

1 Like

I’ll try to write some possible definition that could be included into the standard. First I will define a new kind of conformability. That means that the old way to define conformability shouldn’t really change. Morevere while the standard conformability is almost a transitivity property. This other one wouldn’t be a transitive property.

"Conformability by broadcasting. A group of array are conformable by broadcasting if they are of the same rank (or scalar) and there exists a broadcasted shape such that if the size of a dimension of all the arrays is equal then the broadcasted size will be that value. if that is not the case, all the sizes greater then one should be equal and that size would be the broadcasted size (of a given dimension). A full shape array is an array in that group which has a shape equal to the broadcasted shape.
Every actual arguments to an elemental procedure should be conformable by broadcasting. Every actual argument associated to a dummy argument of an elemental procedure with intent out, inout or undefined should be a full shape array. The result of an elemental function has the shape of the broadcasted shape of the actual arguments of the function.
‘The same will happen for intrinsic operators.’ "

If you are concerned about possible errors a broadcastable attribute can be added to the array definition, adding the following to the previous definition: “every array without the broadcastable attribute should be a full shape array”. And: “the actual result of an elemental function or intrinsic operator is broadcastable if at least one of its actual arguments is broadcastable”.

Then for sections one can add that only sections with the “+” (as in one of my previous post) is broadcastable.

So a programmer should known what is he doing as without the broadcastable attribute everything will be as before.

The standard definition of broadcastability should apply in all other places like in intrinsic assignment, etc.
I don’t like much the repeatability of elements of arrays like in R language.

Could you also write this up as an issue here. This is would make it more likely to be actually considered by the committee.