[PATCH] D29449: [SLP] Generalization of vectorization of CmpInst operands, NFC.

Fri Feb 3 15:50:00 PST 2017

mkuper added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4806
     // Try to vectorize trees that start at compare instructions.
-    if (CmpInst *CI = dyn_cast<CmpInst>(it)) {
-      if (tryToVectorizePair(CI->getOperand(0), CI->getOperand(1), R)) {
+    if (auto *BI = dyn_cast<BranchInst>(it)) {
+      if (!BI->isConditional())
----------------
ABataev wrote:
> mkuper wrote:
> > The code is not equivalent.
> > 
> > 1) A cmp doesn't have to feed a branch. You can have two icmps feeding an or, for instance, with the i1 being used by a branch. Why would we want to avoid vectorizing those? Or do we already fail to vectorize when the icmp feeds anything but a branch?
> > 2) On the flip side, you can have a branch that is fed by an "or (icmp, icmp)", or any other source of i1. I assume this will get filtered out in tryToVectorize() but it'd be good to have a test.
> > 
> > In any case, the comment above needs to be updated.
> You're not quite correct about it.
> 1. Yes, of course. But this situation will be handled in `vectorizeRootInstruction()` function. It calls `canBeVectorized()` function that traverses all suboperations of the initial instruction (up to `RecursionMaxDepth` tree height). So actually, the current version of the code does the same work twice: the first time when we perform top-to-bottom analysis of all instruction in `vectorizeChainsInBlock()` function and the second time when we perform bottom-to-top analysis in `vectorizeRootInstruction()` function.
> The first top-to-bottom analysis breaks vectorization in some cases (for example, for future min/max reductions and maybe in some cases for binops too).
> 2. Your assumption is correct, this situation is handled by `tryToVectorize()` function. 
> And we have a test for these (or similar) situations in `Transforms/SLPVectorizer/X86/compare-reduce.ll`, `Transforms/SLPVectorizer/X86/horizontal-list.ll` and `Transforms/SLPVectorizer/X86/in-tree-user.ll` tests already.
> I'll update the comment
Re 2 - sounds good.
Re 1 - I'm not quite convinced. I mean, it will work if the cmp is in the middle of a tree, and there's a root instruction somewhere below it, but that doesn't have to be the case. 
I mean, if you have:

```
%cmp = icmp eq i32 %foo, %bar
%cmp2 = icmp eq i32 %foo2, %bar2
%or = or i1 %cmp, %cmp2
br i1 %or, label %l1, label %l2
```

Then yes, you'll call vectorizeRootInstruction() on %or, and that's fine, because it can start with any binary operator, and look through the cmps.
But what if you don't feed a branch at all?

E.g.

```
%cmp = icmp eq i32 %foo, %bar
%z = zext i1 %cmp to i32
```

Or

```
%cmp = icmp eq i32 %foo, %bar
call @foo(i1 %cmp)
```

etc.

I agree that what we do now is pretty silly - we start from a fairly arbitrary subset of possible roots. But I don't think we should reduce that set further.

Or did I miss something about how this works?

https://reviews.llvm.org/D29449