[PATCH] D48950: [PowerPC] Improve codegen for vector loads using scalar_to_vector

Thu Jul 5 10:58:27 PDT 2018

amyk marked 6 inline comments as done.
amyk added inline comments.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:2987
+	// Improvement to 32bit / 64bit vector loads 
+  let AddedComplexity = 400 in { 
+    // LIWAX - used for sign extending i32 -> i64 
----------------
nemanjai wrote:
> This code is within a block with `let AddedComplexity = 400` already set. So no need to repeat that.
It could be I mistakenly seen, but the last `let AddedComplexity` block I saw was on line 2645, 
```
let AddedComplexity = 400, Predicates = [HasP9Vector] in {                        
  // Extra patterns expanding to vector Extract Word/Insert Word                    
  def : Pat<(v4i32 (int_ppc_vsx_xxinsertw v4i32:$A, v2i64:$B, imm:$IMM)),           
            (v4i32 (XXINSERTW $A, $B, imm:$IMM))>;                                  
  def : Pat<(v2i64 (int_ppc_vsx_xxextractuw v2i64:$A, imm:$IMM)),                   
            (v2i64 (COPY_TO_REGCLASS (XXEXTRACTUW $A, imm:$IMM), VSRC))>;           
  } // AddedComplexity = 400, HasP9Vector
```
so wouldn't I still need it? 

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:2990
+    // LIWZX - emitted for either i32 or float 
+    let Predicates = [IsLittleEndian] in { 
+
----------------
nemanjai wrote:
> This predicate is correct since the code does not require `HasP9Vector`. However, putting this into a block with `Predicates = [HasP9Vector]` set is confusing as it might lead the reader to falsely conclude that both predicates are required here. Please pull this code outside of any blocks with `Predicates` set.
I think this is related to the comment about the `let AddedComplexity` block. Isn't this block is in its own `Predicate` set, as well (from what I can see)? 

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3012
+   
+			def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),
+                (v4i32 (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSRC))>; 
----------------
nemanjai wrote:
> The `v4i32/v4f32` patterns are incorrect. The `LXSIWZX/LFIWZX` that the output pattern will eventually expand to will produce a vector register with the value: `0|V|U|U` (`0` == zero, `V` == loaded value, `U` == undefined). However, the result of `scalar_to_vector` needs to be `V|U|U|U`. So this needs a shift using `XXSLDWI` to shift the value left by one word.
Thank you for bringing that to my attention. I am currently working on this. 

================
Comment at: llvm/test/CodeGen/PowerPC/build-vector-tests.ll:113
+;// P8: (LE) lfiwzx, xxpermdi, xxspltw (BE): lfiwzx, xxspltw                  //
+;// P9: (LE) lfiwzx, xxpermdi, (BE): lfiwzx                                   //
 ;vector int spltMemVali(int *ptr) {                                           //
----------------
nemanjai wrote:
> This is bad. The test case is actually a load-and-splat. We were emitting a load-and-splat. Now we're emitting a load, swap. This is functionally incorrect.
> I imagine we just have to rip out the following code from `PPCISelLowering.cpp`:
> ```
>       // If the source for the shuffle is a scalar_to_vector that came from a
>       // 32-bit load, it will have used LXVWSX so we don't need to splat again.
>       if (Subtarget.hasP9Vector() &&
>           ((isLittleEndian && SplatIdx == 3) || 
>            (!isLittleEndian && SplatIdx == 0))) {
>         SDValue Src = V1.getOperand(0);
>         if (Src.getOpcode() == ISD::SCALAR_TO_VECTOR &&
>             Src.getOperand(0).getOpcode() == ISD::LOAD &&
>             Src.getOperand(0).hasOneUse())
>           return V1;
>       }
> ```
Yes, thank you for pointing that out to me. Removing that code fragment does indeed produce a `xxspltw` after the load/permute. I will update `PPCISelLowering.cpp` to reflect this. 

https://reviews.llvm.org/D48950