[PATCH] D26083: [LV] Scalarize operands of predicated instructions

Mon Nov 7 08:14:45 PST 2016

gilr added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4678
           isa<DbgInfoIntrinsic>(&I)) &&
-        Legal->isScalarAfterVectorization(&I)) {
-      scalarizeInstruction(&I);
+        (Legal->isScalarAfterVectorization(&I) ||
+         Cost->isProfitableToScalarize(&I, VF))) {
----------------
mkuper wrote:
> We have 5 places that call isScalarAfterVectorization().
> Is this the only call site that cares about this? (If all of them should care, perhaps wrap in a helper function?)
I think Michael's comment raises a more general question: should it matter why we scalarize?
Put differently, once an instruction is marked for scalarization by legal or cost, shouldn't we treat it uniformly in the code?
[if so, then the question "will the instruction be scalarized" would only be well-defined in the context of some VF]

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6583
+        DenseMap<Instruction *, unsigned> ScalarCosts;
+        if (computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
+          InstsToScalarizeVF.insert(ScalarCosts.begin(), ScalarCosts.end());
----------------
mkuper wrote:
> Just trying to understand if I got this right.
> Let's say we have:
> 
> ```
> %a = add i32, ...
> %r = udiv i32 %x, %a
> %s = udiv i32 %y, %a
> %t = udiv i32 %z, %a
> ```
> In the same block. 
> Will we do the right thing evaluating the cost of scalarizing the add in a separate context for each udiv?
Actually in such a case it seems the addition gets scalarized but cannot be sunk down because of the multiple uses.
I also wonder if we calculate the cost correctly when one predicated instruction feeds another.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6605
+  Worklist.push_back(PredInst);
+  while (!Worklist.empty()) {
+    Instruction *I = Worklist.pop_back_val();
----------------
The following loop causes the patch to assert

void foo(int* restrict a, int b, int* restrict c) {
  for (int i = 0; i < 10000; ++i) {
    if (a[i] > 777) {
      int t2 = c[i];
      int t3 = t2 / b;
      a[i] += t3;
    }
  }
}

while trying to scalarize a uniform value (the predicated load).
Got this example to work by skipping I if any of its operands isUniformAfterVectorization().

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6624
+    if (Legal->isScalarWithPredication(I) && !I->getType()->isVoidTy())
+      ScalarCost += getScalarizationOverhead(ToVectorTy(I->getType(), VF), true,
+                                             false, TTI);
----------------
Wasn't the cost of insert-element accounted for by VectorCost?

https://reviews.llvm.org/D26083