[PATCH] D46126: [SLP] Vectorize transposable binary operand bundles

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 1 12:31:42 PDT 2018


Ayal added inline comments.


================
Comment at: test/Transforms/SLPVectorizer/AArch64/transpose.ll:93
   %tmp2.1 = add i32 %tmp1.0, %tmp1.1
   %tmp2.2 = add i32 %tmp0.2, %tmp0.3
   %tmp2.3 = add i32 %tmp1.2, %tmp1.3
----------------
mssimpso wrote:
> Ayal wrote:
> > Indeed this calls for shuffling with <0,4,2,6> and <1,5,3,7> masks; but is this pattern more natural to expect than
> > ```
> >   %tmp2.1 = add i32 %tmp0.2, %tmp0.3
> >   %tmp2.2 = add i32 %tmp1.0, %tmp1.1
> > ```
> > 
> > Perhaps in some complex numbers context(?)
> Ah, I see your point now. I think the current patch will actually produce incorrect code if you make the above change to the test, since we don't actually enforce the "interleaved order". We'll still use the <0,4,2,6> and <1,5,3,7> masks instead of the stride-2 masks. So I think we should probably record what the mask should be when we do the re-bundling to allow for the various possibilities. It makes sense that we would have to do this in hindsight.
> 
> We can also probably get rid of the MinSize restriction at the same time.
> 
> I'd also want to test the mask against the known shuffle kinds for the cost calculation to ensure we are computing the most appropriate cost for the target. I'm actually surprised we don't already have something like `TTI->getShuffleKind(ArrayRef<int> Mask)`. Perhaps I'll work on that first.
> 
> 
Perhaps the initial natural order, going bottom-up, is the 2 x n matrix transpose, i.e., the even and odd stride-2 masks. This could be further optimized, by considering the orderings preferred by subsequent leaves up the tree, similar to Alexey's `NumOpsWantToKeepOrder`; but here one could pass through several transposings before reaching the leaves... (btw, any worthwhile workloads driving this?)
After choosing the `bestOrder()` = the one wanted by most ops, `SK_PermuteSingleSrc` shuffle costs are used. LV uses `getInterleavedMemoryOpCost*()` to compute the cost of its strided shuffles. And `isShuffle()` may also provide inspiration ;-).


Repository:
  rL LLVM

https://reviews.llvm.org/D46126





More information about the llvm-commits mailing list