[PATCH] D29449: [SLP] Generalization of vectorization of CmpInst operands, NFC.

Mon Feb 6 01:26:54 PST 2017

ABataev added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4806
     // Try to vectorize trees that start at compare instructions.
-    if (CmpInst *CI = dyn_cast<CmpInst>(it)) {
-      if (tryToVectorizePair(CI->getOperand(0), CI->getOperand(1), R)) {
+    if (auto *BI = dyn_cast<BranchInst>(it)) {
+      if (!BI->isConditional())
----------------
mkuper wrote:
> ABataev wrote:
> > mkuper wrote:
> > > The code is not equivalent.
> > > 
> > > 1) A cmp doesn't have to feed a branch. You can have two icmps feeding an or, for instance, with the i1 being used by a branch. Why would we want to avoid vectorizing those? Or do we already fail to vectorize when the icmp feeds anything but a branch?
> > > 2) On the flip side, you can have a branch that is fed by an "or (icmp, icmp)", or any other source of i1. I assume this will get filtered out in tryToVectorize() but it'd be good to have a test.
> > > 
> > > In any case, the comment above needs to be updated.
> > You're not quite correct about it.
> > 1. Yes, of course. But this situation will be handled in `vectorizeRootInstruction()` function. It calls `canBeVectorized()` function that traverses all suboperations of the initial instruction (up to `RecursionMaxDepth` tree height). So actually, the current version of the code does the same work twice: the first time when we perform top-to-bottom analysis of all instruction in `vectorizeChainsInBlock()` function and the second time when we perform bottom-to-top analysis in `vectorizeRootInstruction()` function.
> > The first top-to-bottom analysis breaks vectorization in some cases (for example, for future min/max reductions and maybe in some cases for binops too).
> > 2. Your assumption is correct, this situation is handled by `tryToVectorize()` function. 
> > And we have a test for these (or similar) situations in `Transforms/SLPVectorizer/X86/compare-reduce.ll`, `Transforms/SLPVectorizer/X86/horizontal-list.ll` and `Transforms/SLPVectorizer/X86/in-tree-user.ll` tests already.
> > I'll update the comment
> Re 2 - sounds good.
> Re 1 - I'm not quite convinced. I mean, it will work if the cmp is in the middle of a tree, and there's a root instruction somewhere below it, but that doesn't have to be the case. 
> I mean, if you have:
> 
> ```
> %cmp = icmp eq i32 %foo, %bar
> %cmp2 = icmp eq i32 %foo2, %bar2
> %or = or i1 %cmp, %cmp2
> br i1 %or, label %l1, label %l2
> ```
> 
> Then yes, you'll call vectorizeRootInstruction() on %or, and that's fine, because it can start with any binary operator, and look through the cmps.
> But what if you don't feed a branch at all?
> 
> E.g.
> 
> ```
> %cmp = icmp eq i32 %foo, %bar
> %z = zext i1 %cmp to i32
> ```
> 
> Or
> 
> ```
> %cmp = icmp eq i32 %foo, %bar
> call @foo(i1 %cmp)
> ```
> 
> etc.
> 
> I agree that what we do now is pretty silly - we start from a fairly arbitrary subset of possible roots. But I don't think we should reduce that set further.
> 
> Or did I miss something about how this works?
1. Michael, the first example:
```
%cmp = icmp eq i32 %foo, %bar
%z = zext i1 %cmp to i32
```
still must feed some instructions like return, store or something like this. Otherwise, it is not used and will be removed from the code and we don't need to perform analysis of this CmpInst at all.
2. Calls also must feed some other instructions. Of course, sometimes we may have standalone calls (like returning void or with ignored result), but in this case BinOps, used as args in this CallInst, won't be analyzed too. We just should teach the vectorizer about this situation.

https://reviews.llvm.org/D29449