Thanks @kargl you nailed it. You highlighted the advantage of having the `sqrt`

intrinsic function: it uses the (faster) `sqrtsd`

instruction every time, while the power `x**(1._sp/2)`

only uses it if you supply the `-ffast-math`

flag.

I think that also resolves the dilemma above: the expression `x**(1._sp/2)`

without `-ffast-math`

behaves as `x**y`

, but with `-ffast-math`

it behaves as `sqrt(x)`

.

Consequently, it seems you can indeed use the general `x**(1._sp/2)`

to implement the intrinsic `sqrt(x)`

, however by default it would only use the fast `sqrtsd`

instruction if `-ffast-math`

is provided (or equivalent). In the case of `sqrt`

we want to always use `-ffast-math`

if it is implemented using `x**(1._sp/2)`

, as it turns out that is what the user wants.

Regarding the cleanest implementation, one option is to have a dedicated ASR node for this operation like this:

```
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -238,6 +238,7 @@ expr
| RealUnaryMinus(expr arg, ttype type, expr? value)
| RealCompare(expr left, cmpop op, expr right, ttype type, expr? value)
| RealBinOp(expr left, binop op, expr right, ttype type, expr? value)
+ | RealSqrt(expr arg, ttype type, expr? value)
| ComplexConstant(float re, float im, ttype type)
| ComplexUnaryMinus(expr arg, ttype type, expr? value)
| ComplexCompare(expr left, cmpop op, expr right, ttype type, expr? value)
```

The frontend would simply use this node for the intrinsic `sqrt(x)`

, as well as it can transform `RealBinOp(x op=Pow, 1/2, ...)`

into `RealSqrt(x, ...)`

if the `-ffast-math`

flag is provided in the ASR->ASR optimization phase.

We are trying to keep the ASR design minimal and only add nodes if needed. I thought the `sqrt`

function could be implemented using the general `RealBinOp`

operator, but I can see now that the backend should only generate the `sqrtsd`

if `-ffast-math`

is provided, but some codes/users cannot use this flag, so it seems a dedicated `RealSqrt`

node might be the way to go, to signal to the backend to use the `sqrtsd`

instruction even without `-ffast-math`

.

An alternative implementation is to recognize the instrinsic `sqrt`

function in the backend and directly generate the `sqrtsd`

instruction. The pro is that ASR is simpler, the con is that the backend has to have a special logic for intrinsic `sqrt`

, so having an explicit ASR node might be cleaner. We struggle these design choices. I’ll think about this.

Regarding inlining, LFortran inlines intrinsic functions (it has access to the source code at compile time). It currently doesn’t use `-ffast-math`

because I have not figured out how to make LLVM do it from C++ yet, but that’s a different issue.

@msz59 going forward, as you can see, you might want to focus on another intrinsic function, the `sqrt`

is quite special, as this thread highlighted.