[LLVMdev] FP Intrinsics

Thu Mar 17 08:27:44 PST 2005

On Thu, 17 Mar 2005, Morten Ofstad wrote:

> Chris Lattner wrote:
>> On Fri, 11 Mar 2005, Morten Ofstad wrote:
>>> I am trying to make the FP intrinsics (abs, sin, cos, sqrt) I've added 
>>> work with the X86ISelPattern, but I'm having some difficulties 
>>> understanding what needs to be done.
>> 
>> Cool.  Here are a couple of requests:
>> 
>> 1. I don't think we need an "llvm.abs" intrinsic at the llvm level.  This
>>    can be modeled as a pair of setcc/select instructions.
>
> OK, I'm not exactly sure where and how to do this matching of setcc/select - 
> can you give me some pointers?

X = abs(Y)    is the same as:

cond = setlt Y, 0
tmp  = sub -0.0, Y
X    = select cond, tmp, Y

>> 3. On X86 at least, sin and cos are not defined over the full numeric
>>    range.  These instructions are useful for applications like yours, and
>>    situations where a flag like "-ffast-math" has been provided.  Because
>>    of this, please name the intrinsics and nodes sin_approx and
>>    cos_approx.  I don't think that sqrt on the X86 has this limitation,
>>    so its intrinsic can be named just "llvm.sqrt".
>
> I think it makes more sense to have the intrinsics as is, but do the 
> code generation in the X86 target different depending on some command 
> line flag. For the pattern ISel, this simply amounts to registering the 
> FP_SIN and FP_COS as nodes which need to be expanded (to calls) if not 
> fast-math is set... What do you think about this approach?

This is fine with me at the code generator level.  If you choose to do 
this, then there is no reason to make sin/cos intrinsics.  The code 
generator might as well just recognize a call to an external function 
named sin/cos and generate the FP_SIN/FP_COS node when appropriate.

Note that we still need llvm.sqrt, because llvm.sqrt (in contrast to sqrt) 
does not set errno.

>> 4. Don't forget a doc patch to docs/LangRef.html :-)
>> 
>>> I assume I have to add new nodetypes for the FP instructions to 
>>> SelectionDAGNodes.h, and make nodes for these in 
>>> SelectionDAGLowering::visitCall when I find the intrinsic...
>
> OK, I will.
>
>>> -- for me it would make most sense to lower the intrinsic to a call if 
>>> it's not supported. However I notice that for other intrinsics (memcpy 
>>> etc.) this is done in LegalizeDAG where the node is expanded to a call if 
>>> it's not directly supported for the target.
>> 
>> Yup, this is what we want to do.  There are two places to implement this: 
>> LegalizeDAG for the SelectionDAG isels, and 
>> lib/CodeGen/IntrinsicLowering.cpp for other isels.
>
> Why not do it in SelectionDAGLowering::visitCall? The way I have implemented 
> it now, this calls TLI.hasNativeSupportForOperation to see if it (for example 
> FP_SQRT) is a legal operation on the target, and if not it sets RenameFn to 
> "sin" and simply goes ahead with generating the call. This is a lot less code 
> than doing it in LegalizeDAG, and also more efficient since it's not first 
> lowered to an instruction and then expanded to a call.

I admit that it's not a huge difference, but the idea behind the legalize 
phase in general is basically to allow optimization on the unlegalized 
representation as well as on the legalized representation.  For example, 
it would make sense to implement constant folding for FP_SIN nodes.  If 
you did this, then only targets that natively support sin would perform 
the constant folding, targets that didn't wouldn't (unless the code to do 
all of the folding was duplicated in lower call).

This all makes a much bigger difference when talking about breaking up 
64-bit operations for 32-bit hosts, or promoting 8-bit operations for 
32-bit RISC-like hosts, but the principal still applies to sin/cos 
lowering I think.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.cs.uiuc.edu/