How much memory is my logical array using?

I’m running fortran code that has an array of 193710244 logicals (Ising spins in physics). I’m wondering how much RAM it is using? Does a single logical take a bit or a byte? This is of concern because I wonder if I need to invest in more RAM to make the code efficient.

ps. My computer has 8 cores and I run six random realizations of the job in parallel (on different terminal windows), each of which requires its own array of logicals. I need thousands of realizations to get good statistics, so running six jobs at once cuts the time of the job by that fraction. It still takes two weeks to get my solid statistics. I’ve found that if I run seven realizations of the job, the computer bombs. This makes me think I’m either over heating the computer, or I’m over using the RAM.

1 Like

You can use the storage_size(3f) function to determine the storage your variables use.
Generally, a LOGICAL takes up the same storage as a default INTEGER but that can vary between compilers. Although common in the past, it is unlikely it is only using a bit. There
are procedures to access individual bits you can use to reduce the storage, but it may have
a performance impact.

On some systems instead of using a LOGICAL the smallest supported INTEGER might take less storage. Some systems support multiple LOGICAL kinds.

If you are on GNU/Linux or Unix there are many commands that let you see the memory you are
using, such as top(1). If you are statically declaring the array the size(1) command might be useful, but because of the indicated size I doubt you are.

Most shells include a time(1) command that shows the memory a process used, but that varies
from system to system as to whether it is accurate or not.

For starters, in addition to top(1) and time(1) see vmstat(1) and ps(1).

Including what OS and compiler you are using would help in answering better. How much memory does your platform have? Do you have a process memory limit set (ulimit -a in bash, or limit in tcsh)?

The time(1) command in tcsh(1) has the best options for showing memory (“man csh” or “man tcsh” if you have it on your system has a section on the “time” command).

A good tool for general memory inspection is valgrind(1) if you have it. It is generally used for tracking down memory leaks but can give you a lot of information about memory usage.

storage_size(3)                                                                                                  storage_size(3)

NAME
  storage_size(3) - [BIT:INQUIRY] Storage size in bits

SYNTAX
    result = storage_size(a, kind)

DESCRIPTION
  Returns the storage size of argument a in bits.

ARGUMENTS
  • a : Shall be a scalar or array of any type.

  • kind : (Optional) shall be a scalar integer constant expression.

RETURNS
  The result is a scalar integer with the kind type parameter specified by kind (or default integer type if kind is missing).
  The result value is the size expressed in bits for an element of an array that has the dynamic type and type parameters of a.

EXAMPLES
  Sample program

    program demo_storage_size
    implicit none
       write(*,*)'size of integer       ',storage_size(0)
       write(*,*)'size of real          ',storage_size(0.0)
       write(*,*)'size of logical       ',storage_size(.true.)
       write(*,*)'size of complex       ',storage_size((0.0,0.0))
       write(*,*)'size of integer array ',storage_size([0,1,2,3,4,5,6,7,8,9])
    end program demo_storage_size

  Results:

        size of integer                 32
        size of real                    32
        size of logical                 32
        size of complex                 64
        size of integer array           32

STANDARD
  Fortran 2008 and later

SEE ALSO
  c_sizeof(3) (C_SIZEOF)

   fortran-lang intrinsic descriptions
                                                        January 15, 2022                                         storage_size(3)
1 Like

So you can create a little program to query the available LOGICAL kinds on your machine with your compiler and write a second program to query the available kinds and show their size.
This shows on my machine that gfortran, ifort, and nvfortran all have an 8-bit LOGICAL, and that the default is a 32-bit LOGICAL. So if you are constrained by the available amount of memory,
you can use 8-bit LOGICAL values instead of 32-bit LOGICALs in this case, by using
“LOGICAL(KIND=1) :: VARIABLE_NAME” instead of the default. Note that the default is probably the fastest, so you might see performance degrade. How much (if any) really depends
on your programming environment.

#!/bin/bash
cat >xx.f90 <<\EOF
program testit
use ISO_FORTRAN_ENV, only : logical_kinds
implicit none
character(len=*),parameter :: g='(*(g0))'
character(len=*),parameter :: gg='(*(g0,1x))'
integer :: i
write(*,g)'program sizes'
write(*,gg)'! logical kinds are ',LOGICAL_KINDS
do i=1,size(logical_kinds)
   write(*,g)'logical(kind=',logical_kinds(i),") :: i",logical_kinds(i)
enddo
do i=1,size(logical_kinds)
   write(*,g)'write(*,"(*(g0))")"kind=',logical_kinds(i),',bits=",storage_size(i',logical_kinds(i),')'
enddo
write(*,g)'write(*,"(*(g0))")"kind=default,bits=",storage_size(0=-)'
write(*,g)'end program sizes'
end program testit
EOF
gfortran xx.f90 -o gl
ifort xx.f90 -o il
nvfortran xx.f90 -o nl
./gl >gll.f90
./il >ill.f90
./nl >nll.f90
gfortran gll.f90 -o gll
ifort ill.f90 -o ill
nvfortran nll.f90 -o nll
(
echo gfortran
./gll
echo ifort
./ill
echo nvfortran
./nll
) >> $0
exit
gfortran
kind=1,bits=8
kind=2,bits=16
kind=4,bits=32
kind=8,bits=64
kind=16,bits=128
kind=default,bits=32
ifort
kind=1,bits=8
kind=2,bits=16
kind=4,bits=32
kind=8,bits=64
kind=default,bits=32
nvfortran
kind=1,bits=8
kind=2,bits=16
kind=4,bits=32
kind=8,bits=64
kind=default,bits=32
5 Likes

The default is likely 4 bytes. 193710244 * 4 ~ 775Mb per core → 5Gb for seven cores. So that should not be a problem for your computer unless there is < 5Gb RAM available. You could reduce the usage by specifying a lower bits kind via logical_kinds(), but I doubt if that would be more efficient. Also, check if all eight system cores are physical (not four cores with hyper-threading).

2 Likes

If you are very memory constrained you might want to consider mapping your booleans into the bits of an integer, by flipping individual bits. Depending on your default byte size it should decrease consumption by a factor of ~1-1.5, see btest and ibset. Just keep in mind you will be sacrificing performance for decreased memory consumption by packing and unpacking your booleans into bit arrays.

3 Likes

Going from byte storage to bit storage should give a factor of 8 reduction in memory. Going from default logical to bits should give a factor of 32 reduction. (On most machines, of course)

Regarding efficiency of the bit storage approach, it depends on the operation. If you want to, say IEOR() or IAND() two raster images, then you can do those kinds of operations without unpacking the individual bits, so there should be no loss of efficiency at all, and maybe even an improvement because off the lower memory bandwidth required (compared to byte or word storage). If you want to work with individual random bits, then yes, there is the addressing and masking steps required, so there will be some extra overhead due to that.

Many fortran programmers have wanted bit arrays to be included as an intrinsic data type (or kind) for some four decades now, it was one of the often requested features in fortran 8x during the 1980s. They are useful in many contexts, from raster image storage to signal processing to data compression.

6 Likes

If I use LOGICAL(KIND=1) :: VARIABLE_NAME that will reduce the size of each logical from 32 to 8 bits. My guess is that this would be much more efficient. Why do you doubt this? Is it because the default is inherently the most efficient?

1 Like

It will occupy less space in memory but that says nothing about if the processor will handle an 8bit array more efficiently than a 32bit, thus getting a smaller walltime. I am sure that someone in fortran-lang has a better clue as to why the latter might have a shorter walltime.

I would recommend just giving it a shot in your code and comparing the timings! But I would be careful not to spend too much time worrying about optimizations. You can always do more, but it’s more important to simply get the job done.

2 Likes

Don’t forget about the potential performance gain from reduced memory bandwidth use when going from a (likely) default logical size of 4 bytes to 1 byte logicals. If your application is memory bandwidth limited, which I suspect it probably is, you’ll be getting 4 times more real data per memory transfer and some of that should be realized in reduced runtime.

4 Likes

I agree with the advice to just try it. I was going to describe some of the mechanisms that could make the access slower, but as mentioned there are reasons it could increase in speed also. Running small tests with gfortran gave a slight but significant speed up to the smaller storage size.
Since you have a large long-running program if you change it it would be interesting to post the change in run time.

If your programs are paging it would be dramatic if the change lets them stay memory-resident. You never mentioned what OS or compiler you are using, and what memory it has.

We don’t know what other storage you are using, so the storage used for the large LOGICAL array may not be near the size of your overall program size. If you are on MSWindows bring up the Task Manager and you can see the memory usage of your processes; on Linux top(1) is probably the most user-friendly tool and shows you remaining memory, etc.

Does the run-time of a single instance equal the run-time when you are running six of them? When you say things crash when you run seven, are you getting error messages or is the machine shutting off? There are logs and other ways to tell what is happening.

A week run-time is worth at least taking a look at where your code is spending it’s time. Do you use any developer tools that let you profile the code? Even if you are not using a compiler that comes with profiling tools you can probably use gprof to get a general idea of where you are spending your time.

2 Likes

Thanks for your interest. Some details: Linux, gfortran, 16 MGb RAM.

When I run seven jobs, I do not get error messages. The computer just becomes non-responsive. It happens after a few or several days of running. I like to leave my jobs running for a couple weeks, assuming we don’t have a power outage.

ps. I will be running tests in the coming days. It appears that lack of RAM is not the problem, but nonetheless the computer guy convinced me to upgrade to 32MGb RAM. Oh, well.

1 Like

A heavily paging machine (one out of memory) can appear very unresponsive. On linux it can also cause the oom killer to kick in and cause havoc (essentially a process that kills other processes when the machine runs Out Of Memory – it can be useful for unattended web servers and other utility nodes but usually requires customizing to be anything other than useless on HPC machines).

You will probably see messages in /var/log/* files when this occurred that you can look at after you boot. If your programs slowly increase their memory footprint as they run, and combined they ultimately run you out of memory that would very much describe that behavior.

A screenshot of “top -u $USER” when all your programs are running would answer most of those questions, at least to some extent; looking at resident memory when all started, and then again a few days later when you would expect there to be problems (running five instead of six so you hopefully do not lock up).

2 Likes

If there is a memory leak in the app, the system may eventually start swapping jobs between RAM and hard disk which usually makes the computer veeery sloooow responding. If this is true, upgrading RAM should help (in the sense that te system will work longer w/o swapping) but otherwise you should fix the code.
You can check that observing ‘top’ output - it will show the amount of memory (virtual, resident) for every app instance but also the amount of ‘swap’ used (global, in one of the top lines)

If the memory is not the problem, maybe its overheating.

2 Likes

Sorry for a perhaps irrelevant question, do you really need such a big array to store spin?

In quantum Monte Carlo, spins are stored as binary number.
For example, 4 particles (label them 0th, 1st, 2nd, 3rd particles), take integer 0 as spin down, 1 as spin up.
You can use binary number 1011, which convert to decimal is 11 you know.
This 1011, you read it from the right to the left. It means the 0th and 1st particles are spin up because they are 1, the 2nd is spin down because it is 0, the 3rd particle is spin up because it is 1.

Perhaps do such binary <-> decimal mapping (this is 1 to 1 mapping, so no ambiguity) you do not need such a big logical array. I mean in the above example, you can just use number 11 to represent “up down up up”, so you do not need a logical array to represent up down up up.


EDIT. One more thing.
The advantage of using binary representation is that you can do easy bit operations to flip and exchange spins. In the above notation, like, if want to flip the spin of particle label i and find the newspin of the system, what you do is

spin = 11 ! mean 1011
newspin = xor(spin,shiftl(1,i)) 

Say, you want to flip the spin of the label 2 particle, so i=2, that is from the right in 1011 the third number 0 (because our particles label from 0, so i=2 means the third particle from the rightmost). So you want to change 1011 to 1111.
so

shiftl(1,i)

left shift binary number 1 by i=2 digits, so 1 becomes 0100.
Then

 xor(spin,shiftl(1,i)) 

means

1011    XOR    0100

answer is 1111, which is exactly what you want.
With bit operations you can handle all kind of spin operations.

1 Like

As others on this thread have pointed out, a binary <-> decimal mapping would save space, but it would require extra operations to extract individual spins and then put them back in. This would make it more inefficient, according to the consensus here.

Which thread?

Usually the mapping is done at initialization stage and do it only once so you have a mapping table, you do not need to repeatedly do mapping in your code thereafter. The binary <-> decimal mapping is not something expensive.

Besides, can you briefly explain why you need array of 193710244 logicals?
Despite the number 193710244, I barely see people use logicals to represent spin up and down. Is there any advantages of using logicals to represent spin up and downs?

I am just suggesting that you may think of more efficient ways to do the calculations.
I can be wrong, but according to your description, you will not be able to do bigger problem simply because you do not have enough memory? Is Ising model’s bottleneck memory size? This sounds a little strange :rofl:

About the code crashing you described, sounds like there are some bugs in the code.

The bottleneck is a combination of speed and memory. As you point out, I could use the memory more efficiently by storing a spin as a bit rather than a logical, but most people on this thread have said this would be less efficient.

Because of the nature of my research topic, I need a much, much larger array than you need in a classical Ising model. My model is still an “Ising” model in the sense that it is composed of spins (zeros or ones), but there is additional structure imposed on my system. The point is that the system size must be very large in order for the behaviors to clearly emerge.

And there are no bugs in the code. The code can run for days no problem, and bomb on the sixth day, for example. It also bombs when I run more than six random realizations of the job at the same time. This doesn’t seem like a bug.

With the caveat that short test programs (see below) are notoriously easy to read
too much into, it empirically looks likely that using gfortran on Linux
using KIND=1 will not degrade performance for logical operations.

Since you indicated you are using Linux, a cheap way to get your
memory high water mark from Fortran in a batch mode is to read it from
/proc/self/status if you want to report it from your program.

I generally use a call to the C routine getrusage instead, but that
requires using the ISO_C_BINDING and a procedure longer than the
example code!

There are all kinds of goodies in /proc; so you might want to look at
the link referenced in the mstat() procedure below when you have some spare time
, which I added just
as an example.

I still feel top(1) is easy to use and lets you watch things dynamically, but if
you just want a high water mark and are only using Linux, just look
for the line with HWM in it in /proc/self/status from your program.

I’m basing that KIND=1 does not change things from gfortran much on a simple program so again, as mentioned
by several posters earlier try it with your own program. Note that
KIND=4 appears to be faster with optimized ifort, for example.

#!/bin/bash
(
exec 2>&1
set -x
ulimit -a
grep -i total /proc/meminfo
time fpm run --profile debug --compiler gfortran --flag  -DLKIND=1
time fpm run --profile debug --compiler ifort --flag     -DLKIND=1
time fpm run --profile debug --compiler nvfortran --flag -DLKIND=1
time fpm run --profile debug --compiler gfortran --flag  -DLKIND=4
time fpm run --profile debug --compiler ifort --flag     -DLKIND=4
time fpm run --profile debug --compiler nvfortran --flag -DLKIND=4
time fpm run --profile release --compiler gfortran --flag  -DLKIND=1
time fpm run --profile release --compiler ifort --flag     -DLKIND=1
time fpm run --profile release --compiler nvfortran --flag -DLKIND=1
time fpm run --profile release --compiler gfortran --flag  -DLKIND=4
time fpm run --profile release --compiler ifort --flag     -DLKIND=4
time fpm run --profile release --compiler nvfortran --flag -DLKIND=4
)|tee $0.log
exit

Summary of timings

DEBUG MODE

   COMPILER   KIND    REAL_TIME
   gfortran   KIND=1  0m    8.047s
   ifort      KIND=1  0m   21.896s
   nvfortran  KIND=1  0m   18.945s

   gfortran   KIND=4  0m    8.686s
   ifort      KIND=4  0m   21.561s
   nvfortran  KIND=4  0m   19.711s

OPTIMIZED

   COMPILER   KIND    REAL_TIME
   gfortran   KIND=1  0m    5.194s
   ifort      KIND=1  0m    4.687s
   nvfortran  KIND=1  0m   14.377s
   
   gfortran   KIND=4  0m    5.974s
   ifort      KIND=4  0m    3.416s
   nvfortran  KIND=4  0m   14.964s

Again, do not read too much into these numbers except that using KIND=1 with gfortran did not drastically slow things down (I have seen using a non-standard LOGICAL kind slow things down by a factor of 3, but it depends on the chip, the compiler, the compiler flags, …).

trivial test program

Look at the mstat() routine for a Linux-only easy way to get HWM from Fortran, otherwise not really anything to see here …

program testit
implicit none
logical(kind=LKIND),allocatable :: vals(:)
integer :: i, icount
real :: r
   if(allocated(vals))deallocate(vals)
   allocate(vals(450000000))
   ! do some things with a big logical array, doing odd things to
   ! reduce chance the compiler does not optimize everything away
   do i=1,size(vals)
      call random_number(r)
      vals(i)=merge(.true.,.false.,r>0.5)
   enddo
   icount=0
   do i=1,size(vals)
      if(vals(i))icount=icount+1
   enddo
   write(*,*)icount,size(vals),storage_size(vals),storage_size(.true.)
   call mstat()
contains
subroutine mstat()
! /proc rocks!
! reference: https://www.kernel.org/doc/html/latest/filesystems/proc.html
! assume on Linux and show /proc/self/status memory, especially High Water Mark
! also see system_getrusage()
integer :: lun, iostat
character(len=80) :: line
   open(newunit=lun,file='/proc/self/status')
   do 
      read(lun,'(a)',iostat=iostat)line
      if(iostat.ne.0)exit
      if(line(1:2).eq.'Vm')write(*,'(a)')trim(line)
   enddo 
!  open(newunit=lun,file='/proc/self/limits')
end subroutine mstat
end program testit

urbanjs@venus:~/githhub/fun/lsize/

Regarding the bug stuff, if you do not expect your memory usage to increase while running, and the code seems to crash with the same wallclock time you do want to check on it periodically and see if the resident memory usage is increasing inexplicably over time. If it is, and should not be you might want to look at valgrind for a tool for finding memory leaks; although aside from eating up your memory that does not mean results are being affected if there is one.

Although I saw something really similar a long time ago where a code always crashed on Friday night, and was restarted just about every Monday morning (that ended up being because when everyone left on Friday they turned off the air conditioning and the node overheated). When someone asked me to look at it an I told them it was overheating and what the cause was they admitted that had been happening for two years; and that when they had paid for someone to come in and watch what was happening it never happened, so they were really frustrated (the guy coming in was turning on the air conditioner)! So I am not excluding overheating by any means, but typically that happens in far less than a week and (usually) is not so repeatable.

5 Likes

FYI, if you are down to use bits of an integer (which is a good way to reduce the RAM usage) to represent your spins, and you are able to use stdlib in your project, there is a bitset module that does exactly that:

4 Likes

14ng, What do you think? Will this bits method be faster?