I have recently been struggling to make my program faster because it has been spending a long time to get a result. So for speed-up, I have parallelized the most time-consuming part using OpenMP. This worked very well and gave a nice speedup. Now I am considering the use of single precision (32-bit) for the time-consuming part, because my program is written entirely in double precision (64-bit). So, I am wondering whether there are some remarks that I should keep in mind for this kind of calculations.
For example, I guess it is typical to change the precision parameter from “dp” to “sp” throughout the program, but are there some code patterns for which it is definitely better to keep double precision? (I imagine that this may be the case for result variables for summing data, but not very sure.) Alternatively, is it also common to use 32-bit only for the time-consuming part, while keeping the entire program as 64-bit?
More generally, although I have used only 64-bit up to now, is it rather common to perform an entire calculation in 32-bit? I think neural-net calculations use lower-precisions for speed-up, but wondering what kind of other applications typically use 32-bit (partly or entirely). I think this depends greatly on the nature of calculations (whether the calculation is deterministic and needs high precision, or it is stochastic and inherently subject to random noise).
So I would really appreciate it if you share any insight about such calculations or point to any related pages / explanations that I should read beforehand.
Thanks very much in advance!