Mathematics of arrays

Just finished reading their preprint on arXiv (referenced in their GitHub readme), looks like it implements some of the same ideas, but in a different way.

Reminder: the third lecture in the series starts soon!

Edit: Here are the slides and the recording link:

Passcode: !HjJz86e

Lecture today:
All welcome
Reminder:
Lenore Mullin is inviting you to a scheduled Zoom meeting.

Topic: MoA Lecture 3
Time: Oct 1, 2021 02:30 PM Eastern Time (US and Canada)
Every 2 weeks on Fri, until Nov 12, 2021, 4 occurrence(s)
Oct 1, 2021 02:30 PM
Oct 15, 2021 02:30 PM
Oct 29, 2021 02:30 PM
Nov 12, 2021 02:30 PM
Please download and import the following iCalendar (.ics) files to your calendar system.
Weekly: https://albany.zoom.us/meeting/tJwkfumuqT8sGdVCPn9GavN8j88zWjS399rF/ics?icsToken=98tyKuCgpzIqHNORthqGRow-HYr4c-7wtildjadrvy_rWgdSdC2uPLoaKIV1I4uJ

Join Zoom Meeting

Meeting ID: 989 9067 4631
Passcode: 122039
One tap mobile
tel:+13126266799,98990674631# US (Chicago)
tel:+16465588656,98990674631# US (New York)

Dial by your location
tel:+13126266799 US (Chicago)
tel:+16465588656 US (New York)
tel:+13017158592 US (Washington DC)
tel:+13462487799 US (Houston)
tel:+16699006833 US (San Jose)
tel:+12532158782 US (Tacoma)
Meeting ID: 989 9067 4631
Find your local number: Zoom International Dial-in Numbers - Zoom

Join by SIP
mailto:98990674631@zoomcrc.com

Join by H.323
162.255.37.11 (US West)
162.255.36.11 (US East)
221.122.88.195 (China)
115.114.131.7 (India Mumbai)
115.114.115.7 (India Hyderabad)
213.19.144.110 (Amsterdam Netherlands)
213.244.140.110 (Germany)
103.122.166.55 (Australia Sydney)
103.122.167.55 (Australia Melbourne)
209.9.211.110 (Hong Kong SAR)
149.137.40.110 (Singapore)
64.211.144.160 (Brazil)
149.137.68.253 (Mexico)
69.174.57.160 (Canada Toronto)
65.39.152.160 (Canada Vancouver)
207.226.132.110 (Japan Tokyo)
149.137.24.110 (Japan Osaka)
Meeting ID: 989 9067 4631
Passcode: 122039

Sorry I couldn’t make it to the lecture yesterday.
The slides can be accessed here:

And here’s a link for the recording:

Passcode: m@n4=2AL

Which file are you having trouble with? I’ll grant you access or create one that doesn’t need one.

I have been studying what I can define in Fortran and conclude I can define any array operation. In one of the lectures I showed that as long as there is a shape I can associate indices to define the operation, the “stride” function was defined and not in my dissertation.
A number of us are working to define MoA in Fortran using pointers and so far have catenation done. As we add more functionality, we will write a white paper, showing how to extend to all of Fortran that is relevant. It would be ideal if some folks would work with us to build Fortran programs akin to what we can optimize to indicate the need for this research. Re new standard enhancements, we are fortunate to be working @gak and @certik so they will help us identify such things. Thus, I’d love to have anyone who is interested in this endeavor to work with us to bring these ideas, research, research results to fruition.

I pose a challenge to all of you. I’d like someone or some people to write a Fortran program that does the catenation of n vectors. Do it any way you wish but don’t use MoA concepts or techniques. Do you best job and do some timings for small and large matrices.
Please communicate with me as necessary.

@Lenore ,

Can you please point again to some place where it is explained/defined what you mean by catenation? Also, by vectors, one presumes arrays of rank-1 - is that the case?

Those who might have read some posts here by @wyphan and follow-up by @Arjen et al. will have an idea of what is meant by catenation but it is always possible the understanding is inadequate (or incorrect even). Thus it will be useful to know for sure.

Thanks,

Suppose we have a number of vectors, one dimensional arrays, shape has one component and anything can be inside, suppose integers to start.
So if we had 3 (could be n) vectors:
Av= < 8 23 47>
Bv= < 2 4 9 32>
Cv= < 1 90>
We want Av cat Bv cat Cv
Or cat(Av,Bv,Cv) whatever syntax you want
To produce
A vector of length 9 with
< 8 23 47 2 4 9 32 1 90>

This can be done “on-the-fly” with
[Av, Bv, Cv]

Good, can you time that for various sizes and number of catenations?
Maybe start with two cats of 2 components then 3 then up to some n, then vary each length then plot. We’ll then compare MoA approach with same thing.

@lenore, one can get widely varying results with such comparison depending on the processor i.e., the Fortran standard term for compiler (with Fortran there are quite a few!) plus everything related to the hardware (CPU, RAM, L1/L2 cache, etc.) and software (OS, runtime, etc.) and also the problem size.

Please see this thread over at the Intel Fortran forum. Even the concept of expanding an array dynamically is fraught if the programmer is not careful. The same challenges will apply with catenation, especially if one is going by the syntactically convenient [Av, Bv, Cv] option.

Fortran standard does not stipulate any move semantics , elision, etc. targeted toward performance with such operations and leaves a lot to the “processor”.

If at all one seeks to do such analysis, it may be better than to clearly establish some reasonable parameters with a good deal of thought to arrive as close to an apples-to-apples comparison as viable; it’s not easy.

Yes we need apples and apples comparisons with same flags and same platform.

The way to do this benchmark is to use any compiler and any platform, but compare against a theoretical performance peak. We will assume that we want to do an array copy of Av, Bv and Cv into a new array. Let’s assume they are all the same length n, so we have 3*n memory reads and 3*n memory writes. It seems the operation X = [Av, Bv, Cv] is equivalent to three successive (independent) array copies. So the optimal performance of this for example on my Intel based MacBook Pro is thus limited by L1 memory write, which takes 0.25 clock cycle per array element (double precision). Of course if the arrays do not fit into L1 cache, it will be limited by L2, L3 and eventually main memory speed. On the Apple M1 the peak is also 0.25 clock cycle per array element.

One can then measure the actual code. One can usually achieve the peak for this simple benchmark. I have done that.

It’s more interesting to measure something more complicated where there is some actual computation.

We will. Right now all we have is catenate thanks to Arjen. It’s a preliminary reality check. Yes let’s use Ondřej’s compiler and take his lead on experiments.
We are slowly building up functionality to catenate Nd arrays and will add arithmetic and other data types as we go along. This will all be evidence for the proposal/paper(s)

1 Like

@certik I was thinking about what you said about complex problem. I am hoping,
even for the trivial vector catenation, and we have to be able to catenate ANY number of vectors without constraint. Thus if Fortran fails, what is the upper limit, if they use up space, what is the upper limit, given of course we know all stats/timings/etc. of the platform machine,
We must also be sure, the compiler is the same version, same flags are used, the OS is the same, any fixes to both have to be the same, I mean really anally apples to apples, and ideally on a single use, no multiprocessing system, dedicated. Otherwise, the epsilons get in the way of predicting performance, and I want to do that.
So, let’s assume MoA version is faster(certainly hope so), then we’ve made a case to the standards committee, compiler writers, etc. Then with that, the MoA definition, trivially extends to n-d arrays. :slight_smile:

More comments, (more ideas come to mind). Another reason I want to start small, justify and add something to the standard, I have also realized, through teaching MoA and it’s use over the years, If I move too fast, the simplicity and beauty will get lost in the drink from the fire hose versus a lovely sip from a mountain fountain.

Here are the links for the slides and the recording for today’s lecture.

Passcode: $stjPS6#

No lecture this week. Series changed to next Friday then every two weeks. All welcome. All previous lectures are on this site also, as well as zoom videos. Also, I’m here for any questions.
Lenore
Lenore Mullin is inviting you to a scheduled Zoom meeting ( note different dates for your calendar)

Topic: MoA Lectures
Time: Nov 5, 2021 02:30 PM Eastern Time (US and Canada)
Every 2 weeks on Fri, until Dec 31, 2021, 5 occurrence(s)
Nov 5, 2021 02:30 PM
Nov 19, 2021 02:30 PM
Dec 3, 2021 02:30 PM
Dec 17, 2021 02:30 PM
Dec 31, 2021 02:30 PM
Please download and import the following iCalendar (.ics) files to your calendar system.
Weekly: https://albany.zoom.us/meeting/tJwkfumuqT8sGdVCPn9GavN8j88zWjS399rF/ics?icsToken=98tyKuCgpzIqHNORthqGRow-HYr4c-7wtildjadrvy_rWgdSdC2uPLoaKIV1I4uJ

Join Zoom Meeting

Meeting ID: 989 9067 4631
Passcode: 122039
One tap mobile
+13126266799,98990674631# US (Chicago)
+16465588656,98990674631# US (New York)

Dial by your location
+1 312 626 6799 US (Chicago)
+1 646 558 8656 US (New York)
+1 301 715 8592 US (Washington DC)
+1 346 248 7799 US (Houston)
+1 669 900 6833 US (San Jose)
+1 253 215 8782 US (Tacoma)
Meeting ID: 989 9067 4631
Find your local number: Zoom International Dial-in Numbers - Zoom

Join by SIP
98990674631@zoomcrc.com

Join by H.323
162.255.37.11 (US West)
162.255.36.11 (US East)
221.122.88.195 (China)
115.114.131.7 (India Mumbai)
115.114.115.7 (India Hyderabad)
213.19.144.110 (Amsterdam Netherlands)
213.244.140.110 (Germany)
103.122.166.55 (Australia Sydney)
103.122.167.55 (Australia Melbourne)
209.9.211.110 (Hong Kong SAR)
149.137.40.110 (Singapore)
64.211.144.160 (Brazil)
149.137.68.253 (Mexico)
69.174.57.160 (Canada Toronto)
65.39.152.160 (Canada Vancouver)
207.226.132.110 (Japan Tokyo)
149.137.24.110 (Japan Osaka)
Meeting ID: 989 9067 4631
Passcode: 122039

Join by Skype for Business

1 Like

Maybe this recent talk is related to the subject of this thread:

Can Tensor Programming Be Liberated from the Fortran Data Paradigm?

by Conal Elliott

Abstract

Classic Fortran programs used " GO TO " for control and (multi-dimensional) arrays for data. While these unstructured building blocks suffice to encode implementations, they fail to yield clear understanding of the essential nature of algorithms and their correctness. Although " GO TO " has largely disappeared from modern programming, arrays are still widely embraced for parallel programming and machine learning. The resulting programming style suffers in safety and compositionality, leading to code that is unnecessarily difficult to write, read, prove, and reuse.

The main message of this talk is that “array algorithms” are often not naturally array algorithms at all, but rather an error-prone, composition-resistant enmeshing of a safe, simple, and illuminating algorithm on a natural (non-array) data type, together with details of decoding from and encoding to arrays. By disentangling these natural data types and corresponding algorithms from their array encodings, we can gain deeper understanding of known algorithms and easily discover correct new algorithms, many of which are well-suited for the modern age of parallel hardware. Where desired, these clear, correct, and compositional algorithms can then be safely, correctly, and systematically (even automatically) converted to operate on arrays by applying a few simple principles in the form of well-known type isomorphisms.