Thanks @kargl you nailed it. You highlighted the advantage of having the sqrt
intrinsic function: it uses the (faster) sqrtsd
instruction every time, while the power x**(1._sp/2)
only uses it if you supply the -ffast-math
flag.
I think that also resolves the dilemma above: the expression x**(1._sp/2)
without -ffast-math
behaves as x**y
, but with -ffast-math
it behaves as sqrt(x)
.
Consequently, it seems you can indeed use the general x**(1._sp/2)
to implement the intrinsic sqrt(x)
, however by default it would only use the fast sqrtsd
instruction if -ffast-math
is provided (or equivalent). In the case of sqrt
we want to always use -ffast-math
if it is implemented using x**(1._sp/2)
, as it turns out that is what the user wants.
Regarding the cleanest implementation, one option is to have a dedicated ASR node for this operation like this:
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -238,6 +238,7 @@ expr
| RealUnaryMinus(expr arg, ttype type, expr? value)
| RealCompare(expr left, cmpop op, expr right, ttype type, expr? value)
| RealBinOp(expr left, binop op, expr right, ttype type, expr? value)
+ | RealSqrt(expr arg, ttype type, expr? value)
| ComplexConstant(float re, float im, ttype type)
| ComplexUnaryMinus(expr arg, ttype type, expr? value)
| ComplexCompare(expr left, cmpop op, expr right, ttype type, expr? value)
The frontend would simply use this node for the intrinsic sqrt(x)
, as well as it can transform RealBinOp(x op=Pow, 1/2, ...)
into RealSqrt(x, ...)
if the -ffast-math
flag is provided in the ASR->ASR optimization phase.
We are trying to keep the ASR design minimal and only add nodes if needed. I thought the sqrt
function could be implemented using the general RealBinOp
operator, but I can see now that the backend should only generate the sqrtsd
if -ffast-math
is provided, but some codes/users cannot use this flag, so it seems a dedicated RealSqrt
node might be the way to go, to signal to the backend to use the sqrtsd
instruction even without -ffast-math
.
An alternative implementation is to recognize the instrinsic sqrt
function in the backend and directly generate the sqrtsd
instruction. The pro is that ASR is simpler, the con is that the backend has to have a special logic for intrinsic sqrt
, so having an explicit ASR node might be cleaner. We struggle these design choices. I’ll think about this.
Regarding inlining, LFortran inlines intrinsic functions (it has access to the source code at compile time). It currently doesn’t use -ffast-math
because I have not figured out how to make LLVM do it from C++ yet, but that’s a different issue.
@msz59 going forward, as you can see, you might want to focus on another intrinsic function, the sqrt
is quite special, as this thread highlighted.