[PATCH] D46126: [SLP] Vectorize transposable binary operand bundles

Matthew Simpson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 30 11:08:59 PDT 2018


mssimpso added a comment.

In https://reviews.llvm.org/D46126#1082870, @Ayal wrote:

> This is reminiscent of LV's interleave group optimization, in the sense that a couple of correlated inefficient vector "gathers" are replaced by a couple of efficiently formed vectors followed by transposing shuffles. The correlated gathers may come from the two operands of a binary operation, as in this patch, or more generally from arbitrary leaves of the SLP tree.


Thanks for taking a look at the patch, Ayal! Yes, that's right. It's a little like LV's interleave groups. While the "correlated gathers" would be leaves of the tree without this patch, if we can perform a transpose, we can continue recursively building a deeper tree based on the shuffled bundles. That's a good summary that could be incorporated in the comment above `transposeBinOpBundle`.

And yes, the approach is currently limited to the two operands of binary operations. But  it could be generalized to include the operands of other instructions, or as you point out, any correlated gathers in the SLP tree.



================
Comment at: lib/Analysis/VectorUtils.cpp:535
+  return ConstantVector::get(Mask);
+}
+
----------------
Ayal wrote:
> Could `createStrideMask` be (re)used?
Not directly, but we could use `createStrideMask` to create two stride-2 masks and then interleave them.

```
stride_mask_0 = <0, 2,  4, 6>
stride_mask_1 = <8, 10, 12, 14>
transpose_mask = <0, 8, 2, 10, 4, 12, 6, 14>
```

What do you think?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1463
+    Opcode2Operands[VLIOp0->getOpcode()].push_back(VLIOp0);
+    Opcode2Operands[VLIOp0->getOpcode()].push_back(VLIOp1);
+  }
----------------
Ayal wrote:
> Both operands are mapped to the opcode of operand 0?
I've already checked that both operands have the same opcode (starting at line 1423). I also added a comment here, but this is probably still a bit confusing. I will rewrite this to make it more clear.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1521
+    unsigned ElemsPerDstVec = BundleVF * NumBundles / VF;
+    unsigned ElemsPerSrcVec = ElemsPerDstVec / 2;
+    Type *DstTy = VectorType::get(VecTy->getElementType(), ElemsPerDstVec);
----------------
Ayal wrote:
> May be easier to follow if `ElemsPerSrcVec` is initially set to `BundleVF` and doubled inside the loop, `ElemsPerDstVec` is set to its double, and `VF` renamed to something like `NumVectors`.
Sounds good.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1973
+      // determine if we can transpose them, and if so, continue building the
+      // tree with the transposed bundles.
+      if (isa<BinaryOperator>(VL0))
----------------
Ayal wrote:
> Comment that transposing operands each having a common opcode is mutually exclusive with swapping commutative operands below, and should precede it?
Yes, that's a good idea.


Repository:
  rL LLVM

https://reviews.llvm.org/D46126





More information about the llvm-commits mailing list