[PATCH] D83245: [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE.
Nemanja Ivanovic via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 6 14:33:46 PDT 2020
nemanjai requested changes to this revision.
nemanjai added a comment.
This revision now requires changes to proceed.
Pretty close to the way it should be but the indices do need to flip so I have to request changes.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9798
+
+ // The LHS and RHS may be bitcasts to v8i16 as we canonicalize shuffles
+ // to v8i16. Peek through the bitcasts to get the actual operands,
----------------
This comment is incorrect. The canonical type is `v16i8`.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9801
+ // and canonicalize the RHS being a BUILD_VECTOR when lowering to xxsplti32dx.
+ LHS = peekThroughBitcasts(LHS);
+ RHS = peekThroughBitcasts(RHS);
----------------
Is there a reason we don't just define these this way above?
i.e. `SDValue LHS = peekThroughBitcasts(SVN->getOperand(0));`
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9817
+ bool HasAnyUndefs;
+ bool IsBVNConstSplat =
+ BVN->isConstantSplat(APSplatValue, APSplatUndef, SplatBitSize,
----------------
You do not use `IsBVNConstSplat` anywhere except in the condition. You can just put the call in the condition
i.e. `if (!BVN->isConstantSplat(...) || SplatBitSize > 32)`
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9824
+ // Check if RHS is a splat of 4-bytes (or smaller).
+ if ((SplatBitSize / 8) > 4)
+ return SDValue();
----------------
This can be folded into the above condition. I think it is reasonable to expect the reader to understand that 32 bits is 4 bytes (on PPC) so we don't need to divide by 8.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9827
+
+ // Check that the shuffle mask matches the semantics the XXSPLTI32DX.
+ // XXSPLTI32DX can insert 4 byte chunks from the constant splat C into:
----------------
anil9 wrote:
> semantics the -> semantics of the
```
// Check that the shuffle mask matches the semantics of XXSPLTI32DX.
// The instruction splats a constant C into two words of the source vector
// producing { C, Unchanged, C, Unchanged } or { Unchanged, C, Unchanged, C }.
// Thus we check that the shuffle mask is the equivalent of
// <0, [4-7], 2, [4-7]> or <[4-7], 1, [4-7], 3> respectively.
// Note: the check above of isNByteElemShuffleMask() ensures that the bytes
// within each word are consecutive, so we only need to check the first byte.
```
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9839
+ ShuffleMask[4] > 15 && ShuffleMask[12] > 15)) // Case 1.
+ Index = DAG.getTargetConstant(IsLE ? 1 : 0, DL, MVT::i1);
+ else if ((ShuffleMask[4] == 4 && ShuffleMask[12] == 12) && // Case 2.
----------------
We are after type legalization here, can you please use legal types (i.e. no `MVT::i1`).
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9839
+ ShuffleMask[4] > 15 && ShuffleMask[12] > 15)) // Case 1.
+ Index = DAG.getTargetConstant(IsLE ? 1 : 0, DL, MVT::i1);
+ else if ((ShuffleMask[4] == 4 && ShuffleMask[12] == 12) && // Case 2.
----------------
nemanjai wrote:
> We are after type legalization here, can you please use legal types (i.e. no `MVT::i1`).
This is backwards. On LE, the rightmost element is element zero. In this path, the constant goes into the most significant word of each doubleword. So your `Index` needs to flip in both places.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.h:1278
+ /// handled by the XXSPLTI32DX instruction introduced in ISA 3.1.
+ SDValue lowerToXXSPLTI32DX(ShuffleVectorSDNode *N, SelectionDAG &DAG) const;
+
----------------
anil9 wrote:
> nit :
> /// otherwise return the default SDValue. ???
All lowering and combine functions return a default constructed SDValue when unsuccessful. There is no reason to call that out specifically.
================
Comment at: llvm/lib/Target/PowerPC/PPCInstrPrefix.td:754
+ [(set v2i64:$XT,
+ (PPCxxsplti32dx v2i64:$XTi, i1:$IX,
+ i32:$IMM32))]>,
----------------
Can we please use `i32` rather than `i1` as the latter could lead to issues (with using CRBIT registers which we really don't want to do).
================
Comment at: llvm/test/CodeGen/PowerPC/p10-splatImm32.ll:21
+entry:
+ %vecins1 = shufflevector <4 x i32> %a, <4 x i32> <i32 undef, i32 566, i32 undef, i32 566>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
+ ret <4 x i32> %vecins1
----------------
The result of this shuffle is:
`{ a[0], 566, a[2], 566 }`
Which produces a vector register:
```
LE: [ 566 | a[2] | 566 | a[0] ] => xxsplti32dx vs34, 0, 566
BE: [ a[0] | 566 | a[2] | 566 ] => xxsplti32dx vs34, 1, 566
```
So it is backwards - similarly to all the test cases.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D83245/new/
https://reviews.llvm.org/D83245
More information about the llvm-commits
mailing list