[LLVMdev] FP emulation

Mon Oct 9 15:18:35 PDT 2006

Hi,

I'm now ready to implement the FP support for my embedded target. 

My target supports only f64 at the moment.
Question: How can I tell LLVM that float is the same as double on my
target? May be by assigning the same register class to both MVT::f32
and MVT::f64?

But FP is supported only in the emulated mode, because the target does
not have any hardware support for FP. Therefore each FP operation is
supposed to be converted into a call of an assembler function
implementing a corresponding operation. All these FP operations
implemented in assembler always expect parameters on concrete f64
registers, i.e. %d0,%d1 and return their results in reg %d0. The value
of %d1 is clobbered by such calls. (actually %dX are pseudo regs, see
below).

1. Since these FP emulation functions takes operands on registers and
produce operands on registers without any further side-effects, they
look pretty much like real instructions. Thus I have the idea to
represent them in the tblgen instruction descriptions like
pseudo-instructions, where constraints define which concrete physical
%dX registers are to use. This would enfore correct register
allocation.

For example:
def FSUB64: I<0x11, (ops), "fsub64", [(set d0, (fsub d0, d1))]>,
           Imp<[d0,d1],[d0,d1]>; // Uses d0, d1 and defines d0,d1 

This seems to work, at least on simple test files. 

But I would also need a way to convert such a FSUB64 pseudo-instruction
into the assembler function call, e.g. "call __fsub64". At the moment I
don't quite understand at which stage and how I should do it (lowering,
selection, combining??? ). What would be the easiest way to map it to
such a call instruction?

One issue with the described approach is a pretty inefficient code
resulting after the register allocation. For example, there are a lot
of instructions of the form "mov %d0, %d0", copying the register into
itself. My guess is that the following happens:
 before reg.alloc there are instructions of the form:
 mov %virtual_reg0, %d0 
 mov %virtual_reg1, %d1 
 fsub64
which ensure that operand constraints of the operation are fullfilled
and they are on the right registers. During the alloction register
allocator assigns the same physical register to the virtual register.
Therefore the code becomes:
 mov %d0, %d0 
 mov %d1, %d1 
 fsub64

But then there is no call to "useless copies elimination" pass or
peephole pass that would basically remove such copies. 

Question: Is there such a pass available in LLVM? Actually, it is also
interesting to know, why the regalloc does not eliminate such coalesced
moves itself? Wouldn't it make sense?

Does this idea of representing the emulated FP operation calls as
instructions as described above make some sense? Or do you see easier
or more useful ways to do it?

2. In reality, the processor has only 32bit regs. Therefore, any f64
value should be mapped to two 32bit registers. What is the best way to
achieve it? I guess this is a well-known kind of problem.

So far I was thinking about introducing some pseudo f64 registers, i.e.
%dX used above, and working with them in the instruction descriptions.
And then at the later stages, probably after lowering and selection,
expand them into pairs of load or store operations. 

But I'm not quite sure that this is a right way to go. I suspect that
something can be done using some form of EXPAND operation in the
lowering pass. For example, I see that assignments of f64 immediates to
globals is expanded by LLVM automatically into two 32bit stores, which
is very nice. May be it is also possible to do it for 64bit registers
as well?

OK, enough questions for today ;)

Thanks for any feedback, 
 Roman

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com