[PATCH] D26083: [LV] Scalarize operands of predicated instructions

Tue Nov 8 08:47:41 PST 2016

mssimpso added a comment.

Michael/Gil,

Thanks for all the comments! I'll update a new version of the patch shortly.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4678
           isa<DbgInfoIntrinsic>(&I)) &&
-        Legal->isScalarAfterVectorization(&I)) {
-      scalarizeInstruction(&I);
+        (Legal->isScalarAfterVectorization(&I) ||
+         Cost->isProfitableToScalarize(&I, VF))) {
----------------
mkuper wrote:
> mssimpso wrote:
> > gilr wrote:
> > > mkuper wrote:
> > > > We have 5 places that call isScalarAfterVectorization().
> > > > Is this the only call site that cares about this? (If all of them should care, perhaps wrap in a helper function?)
> > > I think Michael's comment raises a more general question: should it matter why we scalarize?
> > > Put differently, once an instruction is marked for scalarization by legal or cost, shouldn't we treat it uniformly in the code?
> > > [if so, then the question "will the instruction be scalarized" would only be well-defined in the context of some VF]
> > You're asking if all places where we call isScalarAfterVectorization should also be concerned with isProfitableToScalarize? I think a helper makes sense in general, but this is the only case (at least currently) where I think it will make a difference.
> > 
> > In needsScalarInduction and widenIntInduction, the IVs shouldn't be predicated so it shouldn't make a difference. And in collectValuesToIgnore, we haven't yet computed the instruction costs.
> > 
> > 
> > 
> > And in collectValuesToIgnore, we haven't yet computed the instruction costs.
> So, we get some imprecision here, right? If we end up scalarizing things that aren't in VecValuesToIgnore then we'll overestimate the register pressure? Or am I confused?
> 
> Anyway, assuming I got it right - I'm not saying we need to fix this in this patch. The patch, as it is, is probably still a strict improvement. But a FIXME would be good.
That's right. There is some conservatism in collectValuesToIgnore. Currently, we place a value in VecValuesToIgnore only if we're sure (for all VFs) that the value will be scalar. But this doesn't mean that all other values will be vectorized. With this patch, we'll have a more precise determination of the known scalars after VF selection.

For the question about scalarization by legality vs. the cost model, for code generation, it's not going to matter why we decide to scalarize. In legality, we know that for any (legal) VF that we select, a value will be scalarized. In the cost model, we decide for particular VFs if it's more profitable to scalarize. When we come to code generation, we know what that VF is.

If the comment is that InnerLoopVectorizer shouldn't need to make a distinction between Legal and Cost scalarization, then we can create a helper function like Michael suggested for InnerLoopVectorizer to use. What do you think?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1962

+  /// A map holding instruction-cost pairs for each choice of vectorization
+  /// factor. The presence of an instruction in the mapping indicates that it
----------------
mkuper wrote:
> Maybe make more explicit that the cost is the cost of the instruction if scalarized? ("instruction-cost pairs for each choice of vectorization factor" seems to imply the vectorized cost.)
Sounds good.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1968
+
+  // Returns the expected difference in cost from scalarizing the expression
+  // feeding a predicated instruction \p PredInst. The instructions to
----------------
mkuper wrote:
> This can also be more explicit. "A negative returned value implies..."
Sounds good.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6154
     Factor.Width = UserVF;
+    collectInstsToScalarize(UserVF);
     return Factor;
----------------
mkuper wrote:
> Could you add a test for this?
Sure.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6570
+  // Initialize a mapping for VF in InstsToScalalarize. If we find that it's
+  // not profitable to to scalarize any instructions, the presence of VF in the
+  // map will indicate that we've analyzed it already.
----------------
mkuper wrote:
> "to to" -> "to"
Thanks!

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6583
+        DenseMap<Instruction *, unsigned> ScalarCosts;
+        if (computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
+          InstsToScalarizeVF.insert(ScalarCosts.begin(), ScalarCosts.end());
----------------
gilr wrote:
> mkuper wrote:
> > Just trying to understand if I got this right.
> > Let's say we have:
> > 
> > ```
> > %a = add i32, ...
> > %r = udiv i32 %x, %a
> > %s = udiv i32 %y, %a
> > %t = udiv i32 %z, %a
> > ```
> > In the same block. 
> > Will we do the right thing evaluating the cost of scalarizing the add in a separate context for each udiv?
> Actually in such a case it seems the addition gets scalarized but cannot be sunk down because of the multiple uses.
> I also wonder if we calculate the cost correctly when one predicated instruction feeds another.
Yeah, Gil is right here in that the add in this example will not be sunk. Thanks, Gil! This is because our current predication method places each udiv in it's own separate block. Long term I think we will want to keep the predicated instructions in the same block after vectorization if they were in the same block before vectorization. I can add some comments about this in the meantime.

So we will need to check for the multi-context case, then. But this doesn't mean that we can't sink instructions with multi-uses. We should still be able to handle the following, for example.

```
if:
  %a = add i32, ...
  %r = add i32 %a, %a
  %s = udiv i32 %x, %r
  br
```

I'll add some tests.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6605
+  Worklist.push_back(PredInst);
+  while (!Worklist.empty()) {
+    Instruction *I = Worklist.pop_back_val();
----------------
gilr wrote:
> The following loop causes the patch to assert
> 
> void foo(int* restrict a, int b, int* restrict c) {
>   for (int i = 0; i < 10000; ++i) {
>     if (a[i] > 777) {
>       int t2 = c[i];
>       int t3 = t2 / b;
>       a[i] += t3;
>     }
>   }
> }
> 
> while trying to scalarize a uniform value (the predicated load).
> Got this example to work by skipping I if any of its operands isUniformAfterVectorization().
Thanks, Gil!. I'll fix this in the updated patch and add an appropriate test.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6624
+    if (Legal->isScalarWithPredication(I) && !I->getType()->isVoidTy())
+      ScalarCost += getScalarizationOverhead(ToVectorTy(I->getType(), VF), true,
+                                             false, TTI);
----------------
mkuper wrote:
> gilr wrote:
> > Wasn't the cost of insert-element accounted for by VectorCost?
> Can we end up with a non-scalar type during the traversal?
> I'm thinking about something like a div that depends on an extractvalue from a struct.
For the cost of the insertelement for predicated instructions: that's right. We computed it in VectorCost. The insert overhead should be the same for both versions of the code: predication as-is vs. predication with scalarized operands. So we either need to subtract this overhead from VectorCost or add it to ScalarCost to make the calculations comparable.

For the non-scalar type question: this may be possible (I'll need to investigate). I'll add a test either way. Thanks!

https://reviews.llvm.org/D26083