[LLVMdev] FP emulation (continued)

Fri Nov 17 01:29:46 PST 2006

Hi,

I still have some questions about FP emulation for my embedded target.

To recap a bit:
My target only has integer registers and no hardware support for FP. FP
is supported only via emulation. Only f64 is supported. All FP
operations should be implemented to use i32 registers.

Based on the fruitful discussions on this list I was already able to
implement mapping of the FP operations to special library calls. 

I also implemented a simple version of the register mapping, where I
introduced a bogus f64 register set and used it during the code
selection and register allocation. After register allocation a special
post-RA pass just converts instructions using f64 operands into
multiple instructions using i32 operands. This seems to work, but has
one disadvantage. Since it is a post-RA pass, it uses a fixed mapping
between physical f64 registers and a pair of physical i32 registers,
e.g. D0:f64 -> i1:i32 x i2:i32. This leads to a non-optimal register
allocation. But anyway, I have an almost working compiler with integer
and FP support for my rather specific embedded target! This shows a
very impressive quality of the LLVM compiler.

Another opportunity, as Chris indicated in his previous mails (see
below), would be to expose the fact that f64 regs really are integer
registers. 

> >> The right way would be to expose the fact that these really are
> >> integer  registers, and just use integer registers for it.
> >
> > How and where can this fact be exposed? In register set
> descriptions?
> > Or may be telling to use i32 register class when we assign the
> register
> > class to f64 values?
> 
> It currently cannot be done without changes to the legalize pass.
> 
> >> This
> >> would be no problem except that the legalize doesn't know how to
> >> convert f64 -> 2 x  i32 registers.  This could be added,
> >
> > Can you elaborate a bit more about how this can be added? Do you
> mean
> > that legalize would always create two new virtual i32 registers
> for
> > each such f64 value, copy the parts of f64 into it and let the
> register
> > allocator later allocate some physical registers for it?
> 
> Yes.
> 
> > Would it require adaptations only in the target-specific legalize
> > or do you think that some changes in the common part (in Target
> directory)  of the legalize are required?
> 
> The target independent parts would need to know how to do this. 
> Specifically it would need to know how to "expand" f64 to 2x i32.

I tried to implement it, but I still have some troubles with that. 
In my understanding, the code in TargetLowering.cpp and also in
SelectioNDAGISel.cpp should be altered. I tried for example to modify
the computeRegisterProperties to tell that f64 is actually represented
as 2xi32. I also added some code into the function
FunctionLoweringInfo::CreateRegForValue for allocating this pair of i32
regs for f64 values. But it does not seem to help. 
>From what I can see, the problem is that emitNode() still looks at the
machine instruction descriptions. And since I still have some insns for
load and stores of f64 values (do I still need to have them, if I do
the mapping?), it basically allocates f64 registers without even being
affected in any form by the modifications described above, because it
does not use any information prepared there. 

So, I'm a bit lost now. I don't quite understand what should be done to
explain the CodeGen how to map virtual f64 regs to the pairs of virtual
i32 regs? May be I'm doing something wrong? May be I need to explain
the codegen that f64 is a packed type consisting of 2xi32 or a vector
of i32???  Chris could you elaborate a bit more about this? What needs
to be explained to the codegen/legalizer and where?

Another thing I have in mind is:
It looks like the easiest way at all would be to have a special pass
after the assignment of virtual registers, but before a real register
allocation pass. This pass could define the mapping for each virtual
f64 register and then rewrite the machine insns to use the
corresponding i32 regs. The problem with this approach is that I don't
quite understand how to insert such a pass before physical register
allocation pass and if it can be done at all. Also, it worries me a bit
that it would eventually require modifications of PHI-nodes and
introduction of new ones in those cases, where f64 regs were used in
the PHI nodes. Now a pair of PHI-nodes would be required for that.
Since I don't have experience with PHI-nodes handling in LLVM, I'd like
to avoid this complexity, unless you say it is actually pretty easy to
do. What do you think of this approach? Does it make sense? Is it
easier than the previous one, which requires changes in the code
selector/legalizer?

Thanks,
  Roman

____________________________________________________________________________________
Sponsored Link

Mortgage rates near 39yr lows. 
$420k for $1,399/mo. Calculate new payment! 
www.LowerMyBills.com/lre