[PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE.

Mon Jul 6 14:33:46 PDT 2020

nemanjai requested changes to this revision.
nemanjai added a comment.
This revision now requires changes to proceed.

Pretty close to the way it should be but the indices do need to flip so I have to request changes.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9798
+
+  // The LHS and RHS may be bitcasts to v8i16 as we canonicalize shuffles
+  // to v8i16. Peek through the bitcasts to get the actual operands,
----------------
This comment is incorrect. The canonical type is `v16i8`.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9801
+  // and canonicalize the RHS being a BUILD_VECTOR when lowering to xxsplti32dx.
+  LHS = peekThroughBitcasts(LHS);
+  RHS = peekThroughBitcasts(RHS);
----------------
Is there a reason we don't just define these this way above?
i.e. `SDValue LHS = peekThroughBitcasts(SVN->getOperand(0));`

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9817
+  bool HasAnyUndefs;
+  bool IsBVNConstSplat =
+      BVN->isConstantSplat(APSplatValue, APSplatUndef, SplatBitSize,
----------------
You do not use `IsBVNConstSplat` anywhere except in the condition. You can just put the call in the condition
i.e. `if (!BVN->isConstantSplat(...) || SplatBitSize > 32)`

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9824
+  // Check if RHS is a splat of 4-bytes (or smaller).
+  if ((SplatBitSize / 8) > 4)
+    return SDValue();
----------------
This can be folded into the above condition. I think it is reasonable to expect the reader to understand that 32 bits is 4 bytes (on PPC) so we don't need to divide by 8.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9827
+
+  // Check that the shuffle mask matches the semantics the XXSPLTI32DX.
+  // XXSPLTI32DX can insert 4 byte chunks from the constant splat C into:
----------------
anil9 wrote:
> semantics the -> semantics of the
```
// Check that the shuffle mask matches the semantics of XXSPLTI32DX.
// The instruction splats a constant C into two words of the source vector
// producing { C, Unchanged, C, Unchanged } or { Unchanged, C, Unchanged, C }.
// Thus we check that the shuffle mask is the equivalent  of
// <0, [4-7], 2, [4-7]> or <[4-7], 1, [4-7], 3> respectively.
// Note: the check above of isNByteElemShuffleMask() ensures that the bytes
// within each word are consecutive, so we only need to check the first byte.
```

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9839
+       ShuffleMask[4] > 15 && ShuffleMask[12] > 15)) // Case 1.
+    Index = DAG.getTargetConstant(IsLE ? 1 : 0, DL, MVT::i1);
+  else if ((ShuffleMask[4] == 4 && ShuffleMask[12] == 12) && // Case 2.
----------------
We are after type legalization here, can you please use legal types (i.e. no `MVT::i1`).

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9839
+       ShuffleMask[4] > 15 && ShuffleMask[12] > 15)) // Case 1.
+    Index = DAG.getTargetConstant(IsLE ? 1 : 0, DL, MVT::i1);
+  else if ((ShuffleMask[4] == 4 && ShuffleMask[12] == 12) && // Case 2.
----------------
nemanjai wrote:
> We are after type legalization here, can you please use legal types (i.e. no `MVT::i1`).
This is backwards. On LE, the rightmost element is element zero. In this path, the constant goes into the most significant word of each doubleword. So your `Index` needs to flip in both places.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.h:1278
+    /// handled by the XXSPLTI32DX instruction introduced in ISA 3.1.
+    SDValue lowerToXXSPLTI32DX(ShuffleVectorSDNode *N, SelectionDAG &DAG) const;
+
----------------
anil9 wrote:
> nit : 
> ///  otherwise return the default SDValue.  ???
All lowering and combine functions return a default constructed SDValue when unsuccessful. There is no reason to call that out specifically.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:754
+                             [(set v2i64:$XT,
+                                   (PPCxxsplti32dx v2i64:$XTi, i1:$IX,
+                                                   i32:$IMM32))]>,
----------------
Can we please use `i32` rather than `i1` as the latter could lead to issues (with using CRBIT registers which we really don't want to do).

================
Comment at: llvm/test/CodeGen/PowerPC/p10-splatImm32.ll:21
+entry:
+  %vecins1 = shufflevector <4 x i32> %a, <4 x i32> <i32 undef, i32 566, i32 undef, i32 566>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
+  ret <4 x i32> %vecins1
----------------
The result of this shuffle is:
`{ a[0], 566, a[2], 566 }`
Which produces a vector register:
```
LE: [ 566  | a[2] | 566  | a[0] ] => xxsplti32dx vs34, 0, 566
BE: [ a[0] | 566  | a[2] | 566  ] => xxsplti32dx vs34, 1, 566
```

So it is backwards - similarly to all the test cases.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83245/new/

https://reviews.llvm.org/D83245