Interesting example, but I am not completely sure that array syntax is effective at vectorization. I’ve found several cases where inner loops are faster than array syntax. Have you measured the performance of your approach (2 nested loops plus array syntax) vs 3 nested loops?
1 Like