Size of an unallocated array

Many of these inquiry functions (such as storage_size, etc.) are evaluated at compile time, and may not even have a presence in the compiler’s run time library. In other words, there is no entry point called storage_size_ in the a.out, EXE or DLL that you build from Fortran source code that references such an inquiry function. In such circumstances, the cost to evaluate at run time is exactly zero.

1 Like

@kargl
You are saying the committee has provided a solution.
For many years, I have been saying their solution is not good enough.

SIZE should not provide a negative value from an intrinsic function for an array that is defined.
The compiler should recognise the memory size environment and reply accordingly, not tell the programmer they should solve this problem. The failure should be obvious to the compiler developer ( similar to the LOC function when operating in 3GByte Win32, and why was that not an intrinsic function in the standard ? No vision !! I don’t think Tom Lahey would have accepted this solution.)

It is the attitude that a -ve value answer is ok because it complies with the standard is what I object to, especially when it is clear that is not what was being requested.

I guess we will never agree on this.

This is an endless race if one wants to catch all possible errors. In a library I am catching some possible errors from the callee, for legal code. Catching all possible errors from the callee for legal codes can already be a huge task, and I don’t know many librairies that are doing that. So, no, I don’t want to consider errors from illegal codes: I consider they should be caught either by the compiler, or at runtime provided some appropriate checking options have been turned on during the development phase.

But this of course depends on the kind of library you are writing, for who, and for which purpose…

Anyway, to me the optional solution is more a trick than a proper solution.

Because it would change the meaning of existing, standards conforming programs. And not just mean that they no longer compile, but that they silently behave differently than they used to. That is (for all its advantages and disadvantages) the kind of change the committee simply will not consider.

As I posted above, a solution indicated to coders starting Fortran 2023 per the standard for the kind (bad pun intended) of issues raised in this thread will be along the following lines:

s = ( allocated(x) ? size(x, kind=kind(s)) : xx )

where the coder picks a suitable integer KIND for the size information (say in object s) being sought and specifies a default value xx in case the object in question x is unallocated.

The above can be viewed and accepted as a “balancing act” between existing semantics and the further needs of coders whilst also giving them considerable control and a semblance of compactness amid other verbosity and also trying to avoid undue burden on compiler implementations.

1 Like

@FortranFan

s = ( allocated(x) ? size(x) : expr )

IMHO, that’s a poor way since it is verbose enough when used for each allocatable array. In addition, it could not be very easy to be used in other array-related functions such as lbound, ubound, etc. As you have already mentioned, I also believe that expr is really important to get defined/set correctly to avoid any misleading outcome. From what I have seen, it would not be very easy. Unfortunately, Fortran does not provide any generic escape value such as undefined, NaN, etc, if I am not wrong.

@FortranFan

Here is the basic truth.

As regards RTFM, let me ask to all: How many of you have read the manual of your new car before driving it from the show room to your home? How many of you have read the manual of the coffee machine found in the company you work for? We all use the common sense to use and we expect to get reasonable “outcomes/results” from that object/device/apparatus/etc.

As regards Practitioners, all of us, our families, our friends, etc we enjoy a safe environment to work, play, etc. But when we ask that from a programming language we become annoying persons.

@PierU

  • consistency: …

Any function related to array which is unallocated should not be authorised. We do agree on that in different ways. I say “an error should be raised”, others say “Don’t do that because it is not legal as per spec”. In addition, size() and any other function’s result should not be used for checking if an array is allocated. I just get a memory (last) number (which is the worst of all) which confuses.

efficiency: …

SAFETY FIRST! Next, the rest ones. Especially, in what is published. Otherwise, we keep it in the drawer, laptop, pc, office, laboratory, etc and do not publish it or advertise it.

@everythingfunctional

There is no such thing as the size of an unallocated array … if the code is invalid, it’s allowed to return whatever it wants (or set your computer on fire). You really are asking “How big is this thing I don’t have?”

If the code is invalid language “tools” should alert AND NEXT return whatever it wants!

A quick example how “wrong” you are. You buy an electronic scale. You switch it on and it displays “0”. You “allocate” an apple on it; it displays “50” grams. Next, you remove/deallocate the apple. What do you expect to see on the display? If you continue seeing “50” you will start wondering that something is wrong. Thousand such examples. Feedback is important!

@certik

… Debug build and Release build …

Tfti. I use fpm which works by default in “debug” profile since I have not set any --flag. Is it possible to point the importance of the debug by upgrading debug to a sub-command with level (eg 1-6) option covering from the slackest (least verbose) to the strictest (most verbose). I will open an issue requesting that in github.

@kargl

This is definitive. Nonconforming code can produce anything.

So, the scale should display anything when nothing is placed/“allocated” on it?

@certik

And GFortran does, in this case. So there is no problem, as long as we have good compilers.

That’s great! Please, do your best to properly adjust/regulate fpm debug options because YOU (as instigate/abettor :wink: :grinning:) and I (as inexperienced practitioner :wink: :innocent:) have caused a stir w/o reason (50 messages posted so far)!

What about using other array-related functions? Does the compiler give an error when array is not allocated yet?

@certik

(I actually just use -fcheck=all)

Does fpm debug profile use -fcheck=all? Because I have never noticed such an error message on my screen so far.

@all Thanks a lot for your participation and time in our discussion!

I actually agree with this sentiment. It would be nice if compilers enforced standards conformance by default, and included run-time checks for invalid operations by default (at least when no optimisations have been requested explicitly). I agree that the state of our Fortran tools is not necessarily optimal, and I hope to be able to improve that over time.

I don’t think this analogy quite fits, because a scale doesn’t have an “unallocated” status. If we’re going to properly try and construct an analogy, let’s try something like the following.

Declaring an allocatable array is like specifying a spot we’ll keep an apple basket. Allocating an array is then like putting an apple basket in that spot. One can then store values in the array analogous to putting apples in the basket. Deallocating the array is like taking the basket away from that spot. Asking for the size of an array is then like asking how big is the basket currently in that spot. There is no meaningful answer if the array is not allocated, just like there is no meaningful answer if there is no basket in that spot. Just like if somebody asked you that question, you’d say “Um dude, there is no basket there”, it should be a runtime error if you ask the size of an unallocated array. My point has simply been that when someone subsequently asks the question “well what should it return if you turn of the runtime checks?” is that it’s still a meaningless question with no meaningful answer, and so it’s not worth worrying about other than to say “it’s undefined behavior”.

3 Likes

Maybe (?) I’m old-fashioned, but I do read manuals (of the car, of the coffee machine…) whenever it is useful to do so. No later than yesterday I had to read the manual of my car to recalibrate the tire pressure sensors, because the pressure warning LED had lit on after the change of tires. A few months ago I read the manual of the new heating system that had been installed in my house, to improve the temperature regulation (the installer guy did not bother to fine tune it, and he probably didn’t read the manual anyway). When I get some new equipment I am most of time quickly reading the manual just to make the best use of it, and also because I like understanding how the equipment works.

Needless to say that this is even more the case in my work.

Sorry, but NO: this cannot be a universal rule in programming, it all depends on the objectives. If you are writing some code for the on-board computer of a spaceship that goes to Mars or of a nuclear plant, yes, safety first. If you are writing a simulation code to solve the wave-equation, well, you might be interested first in performances. As a matter of fact nobody will be hurt if some bugs are experienced, and you can just correct them. Checking everything possible at runtime is possible, but at some point it hurts the performances. I don’t know as of today, but ADA had been especially designed for safety and reliability, but nobody ever considered ADA for high performance computing. The right tool for one job is not necessarily the right tool for another job.

2 Likes

To some extent, this is a matter of what should the compiler control and what should the programmer control. @JohnCampbell is arguing for the compiler to be in control of the kind, while the standards committee decided to leave control to the programmer.

I personally like for the programmer to be in control, the same way that the KIND that is returned for sqrt(), sin(), cos(), and so on is under programmer control. However, in cases such as size(), lbound(), ubound(), and so on, I think it might be useful to also add an IERR optional argument to allow the programmer to catch those cases where the returned value does not fit into the returned KIND.

In the case of size(), it is quite typical to work with array size values that fit easily in INT32. There is no need to force the programmer to use INT64 when it is not necessary. And as stated above, changing the default KIND would break backwards compatibility, which I would oppose (and I’m not on the standards committee). There are a few things that might be done to improve the current situation when working with large arrays. One is the IERR argument above, that would allow the programmer to detect when he is using the wrong KIND, and also possibly when the argument is unallocated, or a null pointer, or an optional argument that is not present, and so on. If the IERR argument is not present, then maybe size() should abort execution in those cases. After all an error has occurred, and the compiler can detect it, so why not? Another is a query function that returns an integer KIND that is large enough to hold any possible size() return value. That query function should be the initialization type that can be used in declarations and to define parameters, not just run time.

Regarding size() and c_size_t, remember that integers of that kind are only required to hold the bits, they are not intended to be used by the calling program. That is because fortran does not have an unsigned integer kind. In practice, the only errors that could occur when using the integers are when the array is a 1-byte integer, logical, or character. In all other cases, the byte size will be divided by at least 2, and the value stored will be the same as the value of the integer.

Well it looks like the discussion is already out of control enough so I’ll give my two cents lol!

I understand the logic behind the standard, and the historical reasons but I do disagree on why changing allowing size on unallocatable variable would break anything. If anything, it would break non-conforming code, that calls size(x) on unallocated x, but clearly from this discussion no one has ever done that.

Also the discussion on empty vs. missing baskets is misleading IMHO. If we look at gcc, for example:

/* CFI_cdesc_t, C descriptors are cast to this structure as follows:
   CFI_CDESC_T(CFI_MAX_RANK) foo;
   CFI_cdesc_t * bar = (CFI_cdesc_t *) &foo;
 */
typedef struct CFI_cdesc_t
 {
    void *base_addr;
    size_t elem_len;
    int version;
    CFI_rank_t rank;
    CFI_attribute_t attribute;
    CFI_type_t type;
    CFI_dim_t dim[];
 }
CFI_cdesc_t;

an allocatable variable very much looks like a struct, with a null pointer when not allocated. Compiler experts can give us more information, but I would bet the array descriptors are always there whenever the variable is declared (let’s not consider optimizations).

1 Like

That is correct. The introduction of reallocation semantics did essentially the same. See the Backwards compatibility in programming languages thread for a discussion on this.

EDIT: Found it - Backwards compatibility in different programming languages - #24 by PierU

2 Likes

You’re right. But this argument is somewhat misleading because, according to section 7.4.3.1 of the Standard:

2 The processor shall provide at least one representation method with a decimal exponent range greater than or equal to 18.

Such an (only) default integer kind would be by far enough to properly return size of any thinkable array (exabytes!)

I guess it doesn’t!

Can you please submit a PR against fpm to fix that?

2 Likes

@certik
Please give me more instructions since I have never done a PR before.

I have downloaded fpm git. I have found the file but I have no idea what to fix exactly and how.

Tia

PS: Are you asking me to delete -fcheck=array-temps and -fcheck=bounds and add -fcheck=all in lines 761 and 762?

2 Likes

@FLNewbiee follow the “Basic Setup” section here: Contributing — LFortran, that is, “Fork LFortran” and “Send a New Merge Request” (you can ignore the rest of the page). Just use fpm instead of lfortran. The steps are exactly the same for any project.

@certik

Step 1. Create a new branch

Done! It’s ‘FLNewbiee’.

Step 2. Make changes in relevant file(s)

I have made the change:

-                & flags = ' -Wall -Wextra -Wimplicit-interface -fPIC -fmax-errors=1 -g -fcheck=bounds&
-                          & -fcheck=array-temps -fbacktrace -fcoarray=single', &
+                & flags = ' -Wall -Wextra -Wimplicit-interface -fPIC -fmax-errors=1 -g -fcheck=all&
+                          & -fbacktrace -fcoarray=single', &

Step 3. Commit the changes:

Ready with message: Set fcheck=all

Step 5. Send the merge request

Is it OK to send it?

Tia

3 Likes

@FLNewbiee Awesome, yes please submit it!

I would also prefer to get rid of undefined behavior, so let’s look at ways to improve the behavior of size. There are at least 3 options for handling the case that the first argument to size is an unallocated array:

  1. Make it mandatory to stop with an error.
    This is allowed but not enforced by the standard, probably because a mandatory check can have a increase the runtime. However, nothing prevents a compiler vendor and it’s users from favoring robustness over performance.
  2. Add an optional return value that indicates whether the operation was successful (like the stat argument of the allocate statement.
    Size is a (pure) function, so having intent(out) arguments is not an option.
  3. Return a magic number, e.g. -1 or any negative integer.
    Since negative values have no meaning for size, this seems to be a good option. However, negative values are valid outcome of the lbound and ubound functions. Using this method for size would therefore introduce inconsistencies among these three related functions: one could handle unallocated arrays, two could not.

In summary, I would say the current behavior is not perfect, but the best that can be achieved. If there would be no tradeoff between performance and safety, compiler vendors could implement the check independently of any (optimization) options. However, the current practice indicates that there is a tradeoff and developers need to make a conscious decision.

2 Likes

Good point. There could be a subroutine interface added that has the extra argument(s). The current size() function could be used in initialization expressions, while the subroutine interface could be available also in other contexts. The subroutine interface could return all of the useful information, the size, allocation status, rank, lower and upper bounds, increments between elements, and so on. All of that information is of interest to a programmer.

My suggestion would be to return exactly the same numbers as those of an array with size 0:

  • size(x)=0
  • lbound(x,dim)=1
  • ubound(x,dim)=0

As they don’t necessarily have to return information about the allocation status