[LLVMdev] FP emulation (continued)

Mon Nov 20 15:06:49 PST 2006

Hi Chris,

Thank you very much for your answer! It helps me to move in the right
direction. When you explain it, it sounds rather easy.  But I still
have some tricky issues. This is either because I'm not so familiar
with LLVM or because it is a bit underestimated how much LLVM
legalizer/expander relay on expandable types to be integers (see my
explanations below).

--- Chris Lattner <sabre at nondot.org> wrote:
> > Another opportunity, as Chris indicated in his previous mails (see
> > below), would be to expose the fact that f64 regs really are
> integer
> > registers.
> 
> Right.
> 
> >> The target independent parts would need to know how to do this.
> >> Specifically it would need to know how to "expand" f64 to 2x i32.
> >
> > I tried to implement it, but I still have some troubles with that.
> > In my understanding, the code in TargetLowering.cpp and also in
> > SelectioNDAGISel.cpp should be altered. I tried for example to
> modify
> > the computeRegisterProperties to tell that f64 is actually
> represented
> > as 2xi32.
> 
> Good, this is the first step.  Your goal is to get 
> TLI.getTypeAction(MVT::f64) to return 'expand' and to get 
> TLI.getTypeToTransformTo(f64) to return i32.

After I sent a mail to the mailing list, I figured out that I need to
do this, so I added exactly what you describe and it helped. 

> > I also added some code into the function
> > FunctionLoweringInfo::CreateRegForValue for allocating this pair of
> i32 regs for f64 values. But it does not seem to help.
> 
> Ok.
> 
> > From what I can see, the problem is that emitNode() still looks at
> > the machine instruction descriptions. And since I still have some
>> insns for
>> load and stores of f64 values (do I still need to have them, if I
>> do the mapping?), it basically allocates f64 registers without even
>> being affected in any form by the modifications described above, 
>> because it does not use any information prepared there.

OK. After the changes mentioned above, the pairs of virtual i32 regs
are used in most situations. And it does it exactly as it was intended
to do in most cases.

> If you get here, something is wrong.  The code generator basically
> works 
> like this:
> 
> 1. Convert LLVM to naive dag
> 2. Optimize dag
> 3. Legalize
> 4. Optimize
> 5. Select
> 6. Schedule and emit.
> 
> If you properly mark f64 as expand, f64 values should only exist in
> stages 1/2/3.  After legalization, they should be gone: only legal 
> types(i32) should exist in the dag.

> > So, I'm a bit lost now. I don't quite understand what should be
> done to
> > explain the CodeGen how to map virtual f64 regs to the pairs of
> virtual
> > i32 regs? May be I'm doing something wrong? May be I need to
> explain
> > the codegen that f64 is a packed type consisting of 2xi32 or a
> vector
> > of i32???  Chris could you elaborate a bit more about this? What
> needs
> > to be explained to the codegen/legalizer and where?
> 
> The first step is to get somethign simple like this working:
> 
> void %foo(double* %P) {
>    store double 0.0, double* %P
>    ret void
> }
> 
> This will require the legalizer to turn the double 0.0 into two
> integer zeros, and the store into two integer stores.

Sample code like this, i.e. simple stores, loads or even some
arithmetic operations works fine now. No problems. 

But there are big issues with correct legalization and expansion, i.e.
with ExpandOp() and LegalizeOp(). I don't know how to explain it
properly, but basically these functions assume at many places that in
the case where an MVT requires more than one register this MVT is
always an integer type. There are some assertions checking for it, and
there are quite some places where it is assumed. More over, since
getTypeAction(MVT::f64) now returnes Expand, the legalizer tries to
expand too much and BTW it does not check for getOperationAction or
something like that in this case. For example, it tries to expand also
all the operations like ADD, SUB, etc into operations on the halves of
f64 (probably because it thinks it is an integer ;-) even though for
such operations I do not need any expanstion, since they are
implemented as library functions. 

For most of the places assuming the integer type to be expanded, I
inserted some code to explicitly check if MVT::f64 is being expanded.
This worked for most of the cases, but not for all. In particular I
cannot solve the SELECT_CC on f64 expansion. It generates a target
specific SELECT_CC node that correctly contains pairs of i32 for the
TrueValue and FalseValue. But when the value of this operation is used
later, then expander tries to expand the result of it. And it cannot do
it, since it seems to have a problem with EXTRACT_ELEMENT applied to
SELECT_CC mentioned above. The problem is probably that it cannot
extract the corresponding halves from the target specific SELECT_CC
node (and it can do it without problems for usual integer-based
ISD::SELECT_CC nodes). At this place I got stuck, since I do not see
how I can overcome it.

Overall, changing the lagalizer to support the expansion of tge
MVT::f64 proves to be more complicated as I initially expected. And it
also seems to be a bit of overkill. Therefore I was thinking about the
special pass after code selection, but before register allocation.
After all, I just want to do a transformation on all instructions that
read or write from/into virtual f64 regs.

  load/store vregf64, val 
->  
  load/store vregi32_1, val_low  
  load/store vregi32_2, val_high  

My subjective feeling is that is can be done easier in a separate pass
rather then chaning the legalizer all over the place in a rather
non-elegant way.

> > Another thing I have in mind is:
> > It looks like the easiest way at all would be to have a special
> pass after the assignment of virtual registers, but before a real
> register allocation pass. This pass could define the mapping for each
> virtual
> > f64 register and then rewrite the machine insns to use the
> > corresponding i32 regs. The problem with this approach is that I
> don't quite understand how to insert such a pass before physical 
> register  allocation pass and if it can be done at all. Also, it 
> worries me a bit
> > that it would eventually require modifications of PHI-nodes and
> > introduction of new ones in those cases, where f64 regs were used
> in the PHI nodes. Now a pair of PHI-nodes would be required for that.
> > Since I don't have experience with PHI-nodes handling in LLVM, I'd
> like to avoid this complexity, unless you say it is actually pretty 
> easy to
> > do. What do you think of this approach? Does it make sense? Is it
> > easier than the previous one, which requires changes in the code
> > selector/legalizer?
> 
> The best approach is to make the legalizer do this transformation.

I believe, since you know it certainly better than me. But I
experienced quite some problems, as I described above. Now, if we would
assume for a second that this approach with a separate pass makes some
sense. I'm just curious how I could insert a new pass after the code
selection, but before any other passes including regiser allocation? I
have not found any easy way to do it yet. For post-RA pass it is very
easy and supported, but for pre-RA or post-code-selection - it is non
obvious.
I was thinking about to possibilities:
1) Mark all f64 load/store/move target insns as
usesCustomDAGSchedInserter = 1 and then intercept in the
InsertAtEndOfBasicBlock() their expansion. This should be fine, since
at this stage machine insns are still using the virtual registers and
it happens before register allocation. Then this function could expand
them into pairs of insns operating on i32 virtual regs. The problem
here is that InsertAtEndOfBasicBlock() is called not for all of the
emitted insns. Ironically enough, it is not called for ISD::CopyToReg
and ISD::CopyFromReg, which are the load and store insns. BTW, is it
intended or was it simply overseen? What would happen, if instructions
produced for these nodes are marked usesCustomDAGSchedInserter?
Shouldn't they be passed then to the custom target MI expander as it is
done for all other instructions? Would it make sense to always check
during the insertion of an MI into a BB if it is a
usesCustomDAGSchedInserter marked MI and if yes call a target-specific
expander for it? 

2) Introduce a fake register allocation pass and make it require an
f64toi32 pass as a pre-requisite. And basically call an existing
register allocator like in this code?

namespace {

  static RegisterRegAlloc
    TargetXRegAlloc("targetx", "  targetx register allocator",
                       createTargetXRegisterAllocator);

  struct VISIBILITY_HIDDEN RA : public MachineFunctionPass {
  private:

    MachineFunctionPass *RealRegAlloc;

  public:

    RA()
    {
      // Instantiate a real allocator to do the job!
      RealRegAlloc =
(MachineFunctionPass*)(createLinearScanRegisterAllocator());
    }

    virtual const char* getPassName() const {
      return "TargetX Register Allocator";
    }

    virtual void getAnalysisUsage(AnalysisUsage &AU) const {

        // Add target specific pass as a requirement
        AU.addRequired<f64toi32pass>();

        // Reuse all requirements from the real allocator
        RealRegAlloc->getAnalysisUsage(AU);
    }

    /// runOnMachineFunction - register allocate the whole function
    bool runOnMachineFunction(MachineFunction&);
  };
}

bool RA::runOnMachineFunction(MachineFunction &fn) {
  return RealRegAlloc->runOnMachineFunction(fn);
}

FunctionPass* llvm::createTigerRegisterAllocator() {
  return new RA();
}

Looks fine and pretty obvious, but it does not work. When
runOnMachineFunction is invoked, I get the error, which I don't quite
understand. Why do I get it at all?

AnalysisType& llvm::Pass::getAnalysis() const [with AnalysisType =
llvm::LiveIntervals]: Assertion `Resolver && "Pass has not been
inserted into a PassManager object!"' failed.

OK. These are my current problems with f64 to 2xi32 conversion. So far
I cannot solve it using any of the mentioned methods :(

Any further help and advice are very welcome!

Thanks,
 Roman

P.S. A minor off-topic question: Is it possible to explain the LLVM
backend that "float" is the same type as "double" on my target? I
managed to explain it for immediates and also told to promote f32 to
f64. But it does not work for float variables or parameters, because
LLVM considers them to be float in any case and to have a 32bit
representation in memory. Or do I need to handle this equivalence in
the front-end only?

____________________________________________________________________________________
Sponsored Link

Mortgage rates near 39yr lows. 
$510k for $1,698/mo. Calculate new payment! 
www.LowerMyBills.com/lre