[LLVMdev] FP emulation

Mon Oct 9 15:55:40 PDT 2006

On Mon, 9 Oct 2006, Roman Levenstein wrote:
> I'm now ready to implement the FP support for my embedded target.

cool.

> My target supports only f64 at the moment.
> Question: How can I tell LLVM that float is the same as double on my
> target? May be by assigning the same register class to both MVT::f32
> and MVT::f64?

Just don't assign a register class for the f32 type.  This is what the X86 
backend does when it is in "floating point stack mode".  This will 
implicitly promote everything to f64.

> But FP is supported only in the emulated mode, because the target does
> not have any hardware support for FP. Therefore each FP operation is
> supposed to be converted into a call of an assembler function
> implementing a corresponding operation.

Ok.

> All these FP operations
> implemented in assembler always expect parameters on concrete f64
> registers, i.e. %d0,%d1 and return their results in reg %d0. The value
> of %d1 is clobbered by such calls. (actually %dX are pseudo regs, see
> below).

Ok.

> 1. Since these FP emulation functions takes operands on registers and
> produce operands on registers without any further side-effects, they
> look pretty much like real instructions. Thus I have the idea to
> represent them in the tblgen instruction descriptions like
> pseudo-instructions, where constraints define which concrete physical
> %dX registers are to use. This would enfore correct register
> allocation.
>
> For example:
> def FSUB64: I<0x11, (ops), "fsub64", [(set d0, (fsub d0, d1))]>,
>           Imp<[d0,d1],[d0,d1]>; // Uses d0, d1 and defines d0,d1
>
> This seems to work, at least on simple test files.

That should be a robust solution.

> But I would also need a way to convert such a FSUB64 pseudo-instruction
> into the assembler function call, e.g. "call __fsub64". At the moment I
> don't quite understand at which stage and how I should do it (lowering,
> selection, combining??? ). What would be the easiest way to map it to
> such a call instruction?

Why not just make the asm string be "call __fsub64"?

> One issue with the described approach is a pretty inefficient code
> resulting after the register allocation. For example, there are a lot
> of instructions of the form "mov %d0, %d0", copying the register into
> itself. My guess is that the following happens:

Make sure to implement TargetInstrInfo::isMoveInstr.  This will allow the 
coallescer to eliminate these.

> before reg.alloc there are instructions of the form:
> mov %virtual_reg0, %d0
> mov %virtual_reg1, %d1
> fsub64
> which ensure that operand constraints of the operation are fullfilled
> and they are on the right registers. During the alloction register
> allocator assigns the same physical register to the virtual register.
> Therefore the code becomes:
> mov %d0, %d0
> mov %d1, %d1
> fsub64
>
> But then there is no call to "useless copies elimination" pass or
> peephole pass that would basically remove such copies.

Yep.

> Question: Is there such a pass available in LLVM? Actually, it is also
> interesting to know, why the regalloc does not eliminate such coalesced
> moves itself? Wouldn't it make sense?

The coallescer does, please implement isMoveInstr.

> Does this idea of representing the emulated FP operation calls as
> instructions as described above make some sense? Or do you see easier
> or more useful ways to do it?

That is a reasonable way to do it.  Another reasonable way would be to 
lower them in the instruction selector itself though the use of custom 
expanders.  In practice, using instructions with "call foo" in them 
instead of lowering to calls may be simpler.  Also, if you *know* that 
these calls don't clobber the normal set of callee clobbered registers, 
using the asm string is the right way to go.

> 2. In reality, the processor has only 32bit regs. Therefore, any f64
> value should be mapped to two 32bit registers. What is the best way to
> achieve it? I guess this is a well-known kind of problem.

Ah, this is trickier. :)  We have a robust solution in the integer side, 
but don't allow the FP side to use it.

For the time being, I'd suggest defining an "fp register set" which just 
aliases the integer register set (i.e. say that d0 overlaps r0+r1).

> So far I was thinking about introducing some pseudo f64 registers, i.e.
> %dX used above, and working with them in the instruction descriptions.
> And then at the later stages, probably after lowering and selection,
> expand them into pairs of load or store operations.

If you tell the register allocator about the "aliases", it should do the 
right thing for you.  Take a look at how aliasing in the X86 register set 
is handled in X86RegisterInfo.td.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/