Checking whether the compiler does this is pretty easy:
$> gfortran -O2 -fopt-info-loop test.f90
<no output -- no re-ordering >
$> gfortran -O3 -fopt-info-loop test.f90
test.f90:22:12: optimized: loops interchanged in loop nest
test.f90:17:15: optimized: loop vectorized using 16 byte vectors
test.f90:23:16: optimized: loop vectorized using 16 byte vectors
test.f90:12:24: optimized: basic block part vectorized using 16 byte vectors
test.f90:12:24: optimized: basic block part vectorized using 16 byte vectors
test.f90:13:24: optimized: basic block part vectorized using 16 byte vectors
test.f90:13:24: optimized: basic block part vectorized using 16 byte vectors
test.f90:34:70: optimized: basic block part vectorized using 16 byte vectors
It should be pretty clear what happens and when.
As has been noted this optimization depends on the complexity of the internal loop content. So not so easily generalized.