[PATCH] D34009: [Power9] Exploit vector integer extend instructions when indices aren't correct

Mon Jun 12 03:50:27 PDT 2017

nemanjai added inline comments.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:11226
+// based on the input and output of the vector extension.
+int getLookUpIndex(SDValue Input, SDNode *N) {
+  if (Input.getValueType().getVectorElementType() == MVT::i8 &&
----------------
I'm afraid that having such a generically named function in such a large file may get confusing down the road. Perhaps it would be clearer to fold this logic into `combineBVOfVecExtend()` and just use `getScalarSizeInBits()`. Perhaps something like:
```
if (InputSize + OutputSize == 5)
  TgtElemArrayIdx = 0;
// ...
```

In any case, you don't want to use a valid array index for an invalid combination as you do here. For example, I don't really see anything in this patch that will prevent you from doing something weird for the `v16i8 -> v8i16` case (which there isn't an instruction for as far as I can tell)*. Also, please add that as a test case.

*I realize there isn't a pattern for this in the .td file, but I imagine you'll end up with some weird result in this case.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:11255
+// For BE: the allowed indices are: 0x7,0xF
+uint64_t TargetElems[] = {
+    0x3074B8FC, // b->w
----------------
I think it would be much cleaner to make this array local to `combineBVOfVecExtend` (you can then pass the target encoding to `addShuffleForVecExtend` rather than indexing into it again).

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:11271
+  SmallVector<int, 16> ShuffleMask;
+  for (unsigned i = 0; i < Input.getValueType().getVectorNumElements(); i++)
+    ShuffleMask.push_back(-1);
----------------
Just initialize this at construction time and remove the loop.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:11288
+
+  SDValue Ops[] = {Shuffle};
+  SDValue BV = DAG.getNode(PPCISD::SExtVElems, dl, N->getValueType(0), Ops);
----------------
Do we really need this? I thought there were overloads of `SelectionDAG::getNode()` that take two `SDValue`'s.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:11313
+
+  auto isSExtOfVecExtract = [&](SDValue Op) -> int {
+    if (!Op)
----------------
I don't really see a reason for this to return an `int` rather than a `bool`. The reader might assume that the lambda returns values from a larger space if the return type is `int`.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:11335
+    Elems = Elems << 8;
+    if (!DAG.getDataLayout().isLittleEndian())
+      Index = Index << 4;
----------------
Just a nit: it'd probably be more concise and readable to just use the ternary operator here.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:11337
+      Index = Index << 4;
+    Elems |= Index;
+
----------------
I don't actually understand how this works. Won't the nibbles that correspond to the target elements for the opposite endianness always just be zero? And if that's the case, won't the comparison below always fail? (see below)

Perhaps the unnecessary shuffle produced will just be optimized out, but it's probably better not to emit it to begin with.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:11352
+  // vector_shuffle.
+  if (Elems != TargetElems[getLookUpIndex(Input, N)]) {
+    return addShuffleForVecExtend(N, DAG, Elems, Input);
----------------
Shouldn't both operands of this comparison just have the nibbles corresponding to the opposite endianness masked out?
i.e. both should be and-ed with `0x0F0F0F0F0F0F0F0F` of `0xF0F0F0F0F0F0F0F0`

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:11355
+  }
+
+  return SDValue();
----------------
Just a comment along the lines of
`// Regular lowering will catch cases where a shuffle is not needed.`

================
Comment at: lib/Target/PowerPC/PPCISelLowering.h:70

+      /// SExtVElems, takes an input vector of a smaller type and sign extends
+      // to an output vector of a larger type.
----------------
Nit: line length.

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:2721
+
+def ByteToWord {
+  dag LE_A0 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 0)), i8));
----------------
Weren't these added with the previous patch? Or maybe it was only the LE ones? If so, can you please rebase this patch so that it would apply cleanly?

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3043
+  let Predicates = [HasP9Altivec, IsBigEndian] in {
+    def: Pat<(v2i64 (PPCSExtVElems v16i8:$A)),
+              (v2i64 (VEXTSB2D $A))>;
----------------
The versions with the new node are not endianness specific are they? If not, please move them out of the blocks and have a single copy of each.

================
Comment at: test/CodeGen/PowerPC/vec_int_ext.ll:4
+; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-gnu-linux -mcpu=pwr9 < %s | FileCheck %s -check-prefix=CHECK-BE
+
+define <4 x i32> @vextsb2wLE(<16 x i8> %a) {
----------------
Please add a test that checks that there are no shuffles when the input elements are correct.

https://reviews.llvm.org/D34009