Thanks @kargl you nailed it. You highlighted the advantage of having the sqrt intrinsic function: it uses the (faster) sqrtsd instruction every time, while the power x**(1._sp/2) only uses it if you supply the -ffast-math flag.
I think that also resolves the dilemma above: the expression x**(1._sp/2) without -ffast-math behaves as x**y , but with -ffast-math it behaves as sqrt(x).
Consequently, it seems you can indeed use the general x**(1._sp/2) to implement the intrinsic sqrt(x), however by default it would only use the fast sqrtsd instruction if -ffast-math is provided (or equivalent). In the case of sqrt we want to always use -ffast-math if it is implemented using x**(1._sp/2), as it turns out that is what the user wants.
Regarding the cleanest implementation, one option is to have a dedicated ASR node for this operation like this:
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -238,6 +238,7 @@ expr
| RealUnaryMinus(expr arg, ttype type, expr? value)
| RealCompare(expr left, cmpop op, expr right, ttype type, expr? value)
| RealBinOp(expr left, binop op, expr right, ttype type, expr? value)
+ | RealSqrt(expr arg, ttype type, expr? value)
| ComplexConstant(float re, float im, ttype type)
| ComplexUnaryMinus(expr arg, ttype type, expr? value)
| ComplexCompare(expr left, cmpop op, expr right, ttype type, expr? value)
The frontend would simply use this node for the intrinsic sqrt(x), as well as it can transform RealBinOp(x op=Pow, 1/2, ...) into RealSqrt(x, ...) if the -ffast-math flag is provided in the ASR->ASR optimization phase.
We are trying to keep the ASR design minimal and only add nodes if needed. I thought the sqrt function could be implemented using the general RealBinOp operator, but I can see now that the backend should only generate the sqrtsd if -ffast-math is provided, but some codes/users cannot use this flag, so it seems a dedicated RealSqrt node might be the way to go, to signal to the backend to use the sqrtsd instruction even without -ffast-math.
An alternative implementation is to recognize the instrinsic sqrt function in the backend and directly generate the sqrtsd instruction. The pro is that ASR is simpler, the con is that the backend has to have a special logic for intrinsic sqrt, so having an explicit ASR node might be cleaner. We struggle these design choices. I’ll think about this.
Regarding inlining, LFortran inlines intrinsic functions (it has access to the source code at compile time). It currently doesn’t use -ffast-math because I have not figured out how to make LLVM do it from C++ yet, but that’s a different issue.
@msz59 going forward, as you can see, you might want to focus on another intrinsic function, the sqrt is quite special, as this thread highlighted.