Using single-precision for faster calculation

Even if the speed of one multiplication is the same, you can fit more in a vectorized operation. And the memory bandwidth is utilized better when you move around half the amount of data.

1 Like