`target` attribute seemingly affecting performances

While working on the complete refactoring of a Modern Fortran code, I probably stumbled, very casually but fortunately, onto a very bizarre finding from my POV.

So, previously withing a very computationally intense procedure, some local variables were declared with the target attribute, since they were pointed by some other pointer local variables.

Now, in the refactoring, those pointer variables have gone. So far so good. Now, while cutting off all the unnecessary textual part, I noticed that that target attribute was not needed anymore (and also to shield against some unwanted forgotten things, so that if something still points at them, then I’d run into a compile time error), and boom, got more than 10% speedup by having removed the attribute.

To then double check it was not something “casual” I recompiled an run several times again with/without the attribute, which confirmed the finding.

The compiler with which I am seeing this is GFortran 12.3.0 on Ubuntu 22.04 OS.

Now, the question: is this something to expect?

I would like to test with some other compilers, but I won’t be able to test those in the short time.

This is not surprising, as the target attribute basically tells the compiler that the variable is possibly aliased to another variable, thus preventing some optimizations or making them more difficult.

4 Likes

Yes, you can expect that.

My reasoning goes as follows: The TARGET attribute specifies that the array could be the target of an association. Implementers, guided by the Standard, have determined that the way such associations are kept track of is by having a memory address, index bounds for each dimension, and an address offset for each dimension telling you how much to add to the memory address for each index increment. That means you can’t place elements of the array in any faster resources, like SIMD registers, (without always updating the in-memory copy, every time you update). You can read about gfortran’s array descriptors here. You could implement a more complicated array descriptor, that kept track of how some elements are in some registers, but as far as I know, nobody has bothered, probably with good cause.

3 Likes

It occurs to me that BLOCK could allow for a TARGET variable to lose that attribute for the duration of the BLOCK. The programmer would be responsible for ensuring that no associated variable is referenced for that duration, and in exchange the compiler would place the variable where it liked, restoring the original placement/association at the end of the BLOCK. An inlined copy-in/copy-out, if you like.

A procedure argument without the target or pointer attribute does this.

2 Likes

I haven’t looked at compiler explorer, but that’s insane. Seems like a massive downside to using Fortran pointers if anything with target attribute will never go in SIMD registers.

If I read correctly @themos and @everythingfunctional replies, it just means that you should not mix in the same scope, pointer semantics and computational intensive tasks. So, mingle with pointers in “management” procedures, and use restricted scopes (blocks/subroutines/functions) which internally don’t have any pointer/target attribute to get the performance back. Is that it?

3 Likes

Every association is an opportunity to lose performance.

You are handing the compiler the job of doing the book-keeping, so you don’t run the risk of messing it up yourself. The language/compiler have knowledge of a few flavours of association and hopefully get the book-keeping correct. You could always do just as well or better yourself, at the price of a lot more source code, or dropping down closer to the metal.

Indeed, the procedure boundary is the oldest way we have of fencing-in the effects of associations so that we only need to worry about maintaining them at the beginning and end.

Original FORTRAN won the performance war by having no associations at all (by at least some reckoning).

6 Likes

Indeed. While C was penalized by the “everything is pointer” paradigm. That’s basically why the restrict keyword had been introduced in C99: it tells the compiler “there’s no aliasing to the memory locations that are referenced by this pointer, you can optimize aggressively”.

1 Like

Thanks @themos @PierU for the clarifications. I did not think about aliasing actually.
The “weird” thing though was that the target variable was never used in a pointer association (only bare memory copies). I would have expected the compiler to “see” that, and make as the target attribute was not there.

@mEm can you create a minimal reproducible example (MRE) of the above behavior with target? Create the simplest benchmark that shows the slowdown by just adding target to a declaration and no other change.

Then we can discuss a specific example, look at the generated assembly and reason about what is happening and see if there is a way to implement it better inside the compiler. Without an MRE we can only reason in abstract way, like @PierU and @themos did above, which I agree with. However it might very well be that for a particular example the compiler can actually speed it up.

1 Like

It looks like the compiler decided not to spend compile-time on checking through the code for absence of associations. I, similarly, doubt that a rank-2 local allocatable is checked that all allocates are of the form ALLOCATE(x(N,1)) or ALLOCATE(x(1,N)) and simplified to rank-1 objects.

1 Like

I will try to get one hopefully working without needing to go too much complex, and try to be as fast as I can.

1 Like

I don’t know if that’s also part of the issue or not, but in fact funnily enough, without even saying it, the local target variable had shape of the like [1, N], since it was then used in a matmul operation. And the reasoning I assumed, maybe being wrong, was that “I can avoid needing to do a reshape at the time of the matmul when I can declare it with shape [1, N], since it can be easily treated as a 1D array”.

How?

Maybe. But the right way to me is to not use the target attribute when it’s not needed.

There would be new syntax, maybe SUSPEND(TARGET :: object-name-list), a variant of other-specification-stmt, asserting that all references to each named object are through that name only, for the duration of the BLOCK.

OK, I thought that you were refering to an already existing feature.

Yes, you never want to do a reshape in a hot loop body. People used to do rank-shenanigans in F77 with sequence association and assumed-size arrays. Now we are encouraged to do it with rank-remapping/bounds-remapping of pointers, and possibly leave a bit of performance on the table.

Sometimes pointers are used to access a deeply nested variable.

subroutine sub(var)
type(t1), target :: var
real, pointer :: p

p => var%a%b(2)%c%d

Should it be considered bad practice from a performance point of view?