[PATCH] D18405: [PPC] Prefer floating point load to integer load plus direct move, when there is no other user for the integer load

Mon Apr 4 13:10:54 PDT 2016

kbarton requested changes to this revision.
This revision now requires changes to proceed.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:6302
@@ +6301,3 @@
+  SDNode *Origin = Op.getOperand(0).getNode();
+  if (Origin->getOpcode() != ISD::LOAD)
+    return true;
----------------
amehsan wrote:
> nemanjai wrote:
> > I think this only makes sense if the direct-move path wouldn't be able to use a D-Form load. In that case, you're simply eliminating one direct move instruction. However, in cases where the direct-move path would be able to use a D-Form load, we've just added another load (for the X-Form lxsiwax). For [a contrived] example:
> >   float test (int *arr) {
> >     return arr[2];
> >   }
> > 
> > That being said, I don't know that this would be a show-stopper for this approach as the benefits may outweigh this issue.
> > Also, I think you can probably predict whether the `ISD::LOAD` will be lowered to a D-Form load based on the inputs.
> > I think the complete condition when we don't want the direct move would be:
> > - The input is a load
> > - The only use is an int-to-fp conversion
> > - The offset from the base register is either zero, non-constant or does not fit in the 16-bit D field of a D-Form load
> > 
> > I believe that all of these can be checked without making the code overly complicated.
> We discussed this over email. This is a summary of our conclusion.
> 
> The current patch for deciding whether or not we should use direct move is fine. The code generated for the example above (after this patch) is:
> 
>    0:   08 00 80 38     li      r4,8
>    4:   98 20 03 7c     lxsiwax vs0,r3,r4
>    8:   e0 04 20 f0     xscvsxdsp vs1,vs0
>    c:   20 00 80 4e     blr
> 
> Ideally it should be like what gcc does:
> 
>    0:   08 00 63 38     addi    r3,r3,8
>    4:   ae 1e 20 7c     lfiwax  f1,0,r3
>    8:   9c 0e 20 ec     fcfids  f1,f1
>    c:   20 00 80 4e     blr
> 
> The second code pattern has lower register pressure and that is why it is better. The code that generates first code pattern above was previously used only for Power7. After this patch this code is now used for Power8. The example exposes an opportunity within the new code path. That will be addressed separately from this patch.
OK, I'm confused by these comments.

With this patch, what code sequence is going to be generated?
If there is a further opportunity, why not address it with this patch? If it really makes sense to deal with it in a separate patch, please list the phabricator review (or bugzilla) for it here, so we can be sure to track it and maintain the association between the items.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:6389
@@ -6364,2 +6388,3 @@
   // however, without FPCVT we can't do most conversions.
-  if (Subtarget.hasDirectMove() && Subtarget.isPPC64() && Subtarget.hasFPCVT())
+  if (directMoveIsProfitable(Op) && Subtarget.hasDirectMove() &&
+      Subtarget.isPPC64() && Subtarget.hasFPCVT()) {
----------------
I have a very slight preference for checking whether DirectMove is available before checking whether it is profitable.
Changing the order of the checks will have virtually no impact, but it makes more sense to me. 

================
Comment at: test/CodeGen/PowerPC/direct-move-profit.ll:4
@@ +3,3 @@
+
+target datalayout = "e-m:e-i64:64-n32:64"
+target triple = "powerpc64le-unknown-linux-gnu"
----------------
nemanjai wrote:
> You should probably remove the metadata here and (some) below. The triple on the command line and here conflict and I don't believe that all the metadata below is necessary.
I agree. Please remove this, and the target triple line immediately below, and specify the correct triple on the RUN step. 


http://reviews.llvm.org/D18405