[LLVMdev] FP emulation

Tue Oct 10 14:29:15 PDT 2006

On Tue, 10 Oct 2006, Roman Levenstein wrote:
>>> such a call instruction?
>>
>> Why not just make the asm string be "call __fsub64"?
>
> Well, of course it would be the best solution. But the interesting part
> is that I need to generate the machine code directly because for
> different reasons use of a system assembler is not an option. As a

ok.

> result, I need to do this conversion in the target backend and later
> generate object code directly. But when and how this conversion "fsub64
> insn -> call __fsub64" insn should be done? What is your advice?

I don't understand.  If you are writing out the .o file directly, you 
already know how to encode calls... can't you just encode it as the right 
sort of call?  What facilities are you using to emit the machine code, are 
you using the llvm machine code emitter generator stuff (like PPC)?

>>> Does this idea of representing the emulated FP operation calls as
>>> instructions as described above make some sense? Or do you see
>> easier
>>> or more useful ways to do it?
>>
>> That is a reasonable way to do it.  Another reasonable way would be
>> to  lower them in the instruction selector itself though the use of
>> custom  expanders.  In practice, using instructions with "call foo"
> in > them  instead of lowering to calls may be simpler.
>
> Hmm, let me see. Just to check that I understand your proposal
> correctly:
> You mean I don't need to define any FP operations as machine
> instructions at all. Instead, I basically tell that I will expand all
> FP operations myself and then I simply expand them into the following
> sequence of instructions:
>  mov arg1, %d0 // enfore register constraint
>  mov arg2, %d1 // enfore register constraint
>  call __fsub64
>
> Is it correct understanding?

Yes, if you tell the legalizer you want to custom expand everything, you 
can do this.  In practice, there may be ones the legalizer doesn't know 
how to custom expand yet, b ut that is an easy addition.

> If yes, how do I explain that arguments are to be passed on the concrete 
> physical registers like %d0 and %d1 and result comes on %d0? Do I need 
> to allocate virtual regs for them and pre-assign physical regs somehow?

As others have pointed out, you flag copy{to/from}reg nodes to the call.

> Or my be I have to define a new calling convention that would enforce
> it?
> Actually, how can this be done with LLVM? I mean, if I want to
> introduce a new calling convention, what do I need to do in backend to
> define and register it? Is it required to change the frontend to make
> it visible at the source program level?

You should be able to handle this in the lowering stuff, you don't need 
anything complex here.

>> For the time being, I'd suggest defining an "fp register set" which
>> just aliases the integer register set (i.e. say that d0 overlaps
>> r0+r1).
>
> OK. I almost did this way already. But I introduced two FP register
> sets. One for fp32 (for the future) and one for fp64. fp32 aliases the
> integer register set. fp64 aliases the fp32 register set, but not the
> integer register set explicitly. I thought that aliases are transitive?
> Or do I have to mention all aliases explicitly, e.g. for %d0 I need to
> say [%s0,%s1,%GR0,%GR1]?

Depending on how you defined the aliases, they aren't necessarily 
transitive.  I'd like at the <yourtarget>GenRegisterInfo.inc file, and see 
what it lists as the aliases for each reg.

> But a more interesting question is this: The scheme above assumes that 
> there is a "hardwired" mapping between FP registers and concerete pairs 
> of integer registers. In many cases this is enough, since the emulated 
> operations indeed expect parameters on predefined pairs of 32bit integer 
> registers. But when it comes to other uses of FP registers (mainly for 
> storing some values) there is no this limitation that a concrete pair of 
> integer registers should be used. Actually, any combination of two 32bit 
> integer registers would do. How this can be modelled and represented to 
> regalloc, if at all? One guess it to define one FP reg for each possible 
> combination of two integer registers, which would lead to definition 
> N*(N-1) FP registers, where N is the number of integer registers (And I 
> have only 8 integer regs). But it seems to be not very elegant for my 
> taste,or?

The right way would be to expose the fact that these really are integer 
registers, and just use integer registers for it.  This would be no 
problem except that the legalizer doesn't know how to convert f64 -> 2 x 
i32 registers.  This could be added, but a simpler approach to get you 
running faster is to add the bogus register set.

>>> So far I was thinking about introducing some pseudo f64 registers,
>> i.e.
>>> %dX used above, and working with them in the instruction
>> descriptions.
>>> And then at the later stages, probably after lowering and
>> selection,
>>> expand them into pairs of load or store operations.
>>
>> If you tell the register allocator about the "aliases", it should do
>> the right thing for you.  Take a look at how aliasing in the X86
>> register set is handled in X86RegisterInfo.td.
>
> Can you elaborate a bit? Does it mean that I don't need to define fp64
> loads from memory or fp64 stores to memory and reg<-reg tranfers for
> 64bit ops, because all that will be done automatically using pairs of
> 32bit instructions? So far, I had the impression I need to use fp64
> regs in the instruction descriptions explicitly. But in this case
> generated selected instructions operation on these 64bit regs and there
> is a problem how to expand them into pairs of 32bit instructions.

Oh, I'm sorry, I misunderstood.  Yes, you're right, you'll have to define 
"f64" loads and stores and copies.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/