[PATCH] D48950: [PowerPC] Improve codegen for vector loads using scalar_to_vector

Thu Jul 5 04:16:34 PDT 2018

nemanjai requested changes to this revision.
nemanjai added a comment.
This revision now requires changes to proceed.

A number of the registers in the test cases you've added are mandated by the ABI. Rather than keeping track of which ones should be part of the test case, perhaps it's easiest to use `utils/update_llc_test_checks.py` to automatically generate the `CHECK` directives (if it works for all the RUN lines in your tests).

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:2986
+  
+	// Improvement to 32bit / 64bit vector loads 
+  let AddedComplexity = 400 in { 
----------------
Please refrain from comments such as this one. What you're implementing is an improvement over the old code as every patch presumably is. However saying this within the code begs the question "Improvement over what?"

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:2987
+	// Improvement to 32bit / 64bit vector loads 
+  let AddedComplexity = 400 in { 
+    // LIWAX - used for sign extending i32 -> i64 
----------------
This code is within a block with `let AddedComplexity = 400` already set. So no need to repeat that.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:2989
+    // LIWAX - used for sign extending i32 -> i64 
+    // LIWZX - emitted for either i32 or float 
+    let Predicates = [IsLittleEndian] in { 
----------------
This does not match the code. `LIWZX` is actually used for `i32/f32/(zext i32 -> i64)`.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:2990
+    // LIWZX - emitted for either i32 or float 
+    let Predicates = [IsLittleEndian] in { 
+
----------------
This predicate is correct since the code does not require `HasP9Vector`. However, putting this into a block with `Predicates = [HasP9Vector]` set is confusing as it might lead the reader to falsely conclude that both predicates are required here. Please pull this code outside of any blocks with `Predicates` set.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3012
+   
+			def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),
+                (v4i32 (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSRC))>; 
----------------
The `v4i32/v4f32` patterns are incorrect. The `LXSIWZX/LFIWZX` that the output pattern will eventually expand to will produce a vector register with the value: `0|V|U|U` (`0` == zero, `V` == loaded value, `U` == undefined). However, the result of `scalar_to_vector` needs to be `V|U|U|U`. So this needs a shift using `XXSLDWI` to shift the value left by one word.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3182
+  
+  // Improvement to 64 bit vector loads 
+  let AddedComplexity = 400 in {
----------------
Same nit about comments that talk about improvement.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3184
+  let AddedComplexity = 400 in {
+		// Using pseudoinstructions to ensure utilization of 64 bit registers 
+    let Predicates = [IsLittleEndian, HasP9Vector] in {
----------------
I think that instead of  `utilization of 64 bit registers` you mean `utilization of all 64 VSX registers.` Also, here and everywhere that punctuation is missing - we use complete sentences for comments. This includes punctuation.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3187
+      def : Pat<(v2i64 (scalar_to_vector (i64 (load ixaddr:$src)))),
+                (v2i64 (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC), 2))>;
+      def : Pat<(v2i64 (scalar_to_vector (i64 (load xaddr:$src)))),
----------------
Line too long.

================
Comment at: llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll:12
+; CHECK-P8:           lfiwzx 0, 0, 3
+; CHECK-P8:      			xxpermdi 0, 0, 0, 2
+; CHECK-P8:      			stvx 2, 0, 4
----------------
Why the change in indentation on these?

================
Comment at: llvm/test/CodeGen/PowerPC/build-vector-tests.ll:113
+;// P8: (LE) lfiwzx, xxpermdi, xxspltw (BE): lfiwzx, xxspltw                  //
+;// P9: (LE) lfiwzx, xxpermdi, (BE): lfiwzx                                   //
 ;vector int spltMemVali(int *ptr) {                                           //
----------------
This is bad. The test case is actually a load-and-splat. We were emitting a load-and-splat. Now we're emitting a load, swap. This is functionally incorrect.
I imagine we just have to rip out the following code from `PPCISelLowering.cpp`:
```
      // If the source for the shuffle is a scalar_to_vector that came from a
      // 32-bit load, it will have used LXVWSX so we don't need to splat again.
      if (Subtarget.hasP9Vector() &&
          ((isLittleEndian && SplatIdx == 3) || 
           (!isLittleEndian && SplatIdx == 0))) {
        SDValue Src = V1.getOperand(0);
        if (Src.getOpcode() == ISD::SCALAR_TO_VECTOR &&
            Src.getOperand(0).getOpcode() == ISD::LOAD &&
            Src.getOperand(0).hasOneUse())
          return V1;
      }
```

https://reviews.llvm.org/D48950