Perfect. In fact we can just start with the sum kernel and then go to more complicated ones.

@lkedward can you write Fortran code that you would like to translate to your sum kernel? Here is the sum kernel:

```
__kernel void sum(const int size, const __global float * vec1, __global float * vec2){
int ii = get_global_id(0);
if(ii < size) vec2[ii] += vec1[ii];
}
```

So a direct Fortran counterpart would be:

```
kernel subroutine sum(size, vec1, vec2)
integer, intent(in) :: size
real, global, intent(in) :: vec1(:) ! or vec1(size)
real, global, intent(in) :: vec2(:) ! or vec2(size)
integer :: ii
ii = get_global_id(0)
if(ii < size) vec2(ii) = vec2(ii) + vec1(ii)
end subroutine
```

However, why cannot this kernel be simply generated from `do concurrent`

(it seems it would work for other similar kernels also):

```
real, allocatable :: vec1(:), vec2(:)
integer :: ii
...
do concurrent (ii = 1:size(vec2))
vec2(ii) = vec2(ii) + vec1(ii)
end do
```

Isnâ€™t it semantically equivalent? It seems `do concurrent`

would be simpler for this case and does not need any extra extensions to the Fortran language, it is already part of it, and LFortran simply generates the correct kernel out of it.