[PATCH] D18405: [PPC] Prefer floating point load to integer load plus direct move, when there is no other user for the integer load

Mon Apr 4 13:19:56 PDT 2016

amehsan added inline comments.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:6302
@@ +6301,3 @@
+  SDNode *Origin = Op.getOperand(0).getNode();
+  if (Origin->getOpcode() != ISD::LOAD)
+    return true;
----------------
kbarton wrote:
> amehsan wrote:
> > nemanjai wrote:
> > > I think this only makes sense if the direct-move path wouldn't be able to use a D-Form load. In that case, you're simply eliminating one direct move instruction. However, in cases where the direct-move path would be able to use a D-Form load, we've just added another load (for the X-Form lxsiwax). For [a contrived] example:
> > >   float test (int *arr) {
> > >     return arr[2];
> > >   }
> > > 
> > > That being said, I don't know that this would be a show-stopper for this approach as the benefits may outweigh this issue.
> > > Also, I think you can probably predict whether the `ISD::LOAD` will be lowered to a D-Form load based on the inputs.
> > > I think the complete condition when we don't want the direct move would be:
> > > - The input is a load
> > > - The only use is an int-to-fp conversion
> > > - The offset from the base register is either zero, non-constant or does not fit in the 16-bit D field of a D-Form load
> > > 
> > > I believe that all of these can be checked without making the code overly complicated.
> > We discussed this over email. This is a summary of our conclusion.
> > 
> > The current patch for deciding whether or not we should use direct move is fine. The code generated for the example above (after this patch) is:
> > 
> >    0:   08 00 80 38     li      r4,8
> >    4:   98 20 03 7c     lxsiwax vs0,r3,r4
> >    8:   e0 04 20 f0     xscvsxdsp vs1,vs0
> >    c:   20 00 80 4e     blr
> > 
> > Ideally it should be like what gcc does:
> > 
> >    0:   08 00 63 38     addi    r3,r3,8
> >    4:   ae 1e 20 7c     lfiwax  f1,0,r3
> >    8:   9c 0e 20 ec     fcfids  f1,f1
> >    c:   20 00 80 4e     blr
> > 
> > The second code pattern has lower register pressure and that is why it is better. The code that generates first code pattern above was previously used only for Power7. After this patch this code is now used for Power8. The example exposes an opportunity within the new code path. That will be addressed separately from this patch.
> OK, I'm confused by these comments.
> 
> With this patch, what code sequence is going to be generated?
> If there is a further opportunity, why not address it with this patch? If it really makes sense to deal with it in a separate patch, please list the phabricator review (or bugzilla) for it here, so we can be sure to track it and maintain the association between the items.
The first sequence is the one generated after this patch. The second sequence is the one currently gcc generates. The one that is generated before this patch is this:

```
   0:   08 00 63 80     lwz     r3,8(r3)
   4:   a6 01 03 7c     mtfprwa f0,r3
   8:   e0 04 20 f0     xscvsxdsp vs1,vs0
   c:   20 00 80 4e     blr
```

http://reviews.llvm.org/D18405