[PATCH] D26555: [LV] Keep predicated instructions in the same block

Mon Nov 14 12:07:33 PST 2016

mssimpso added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:482

-  /// This instruction is un-vectorizable. Implement it as a sequence
-  /// of scalars. If \p IfPredicateInstr is true we need to 'hide' each
-  /// scalarized instruction behind an if block predicated on the control
-  /// dependence of the instruction.
-  virtual void scalarizeInstruction(Instruction *Instr,
-                                    bool IfPredicateInstr = false);
+  /// This instruction is un-vectorizable. Implement it as a sequence of
+  /// scalars.
----------------
mkuper wrote:
> As long as you're changing this comment - this is no longer only for "un-vectorizable" instructions.
That's true! I'll update the comment.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4319

   // For each instruction I marked for predication on value C, split I into its
   // own basic block to form an if-then construct over C. Since I may be fed by
----------------
mkuper wrote:
> This changes now, right?
That's right; I missed this. I'll update this comment as well.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4486
+        // corresponding to each instruction in ScalarLoopPredInsts.
+        VectorLoopPredInsts.clear();
+        for (Instruction *I : ScalarLoopPredInsts) {
----------------
mkuper wrote:
> Do we generally prefer this to defining the vector inside the loop, from a performance standpoint? (The latter would be clearer, I think).
Not that I'm aware of. I'm happy to move the vector definition inside the loop.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4488
+        for (Instruction *I : ScalarLoopPredInsts) {
+          if (Lane > 0 && Legal->isUniformAfterVectorization(I))
+            continue;
----------------
mkuper wrote:
> What happens in the "Lane == 0 && Legal->isUniformAfterVectorization(I)" case?
> I'm having a bit of trouble imagining what the code ends up looking.
This loop is collecting in VectorLoopPredInsts the scalarized instructions we produced in scalarizeInstruction. It then predicates all the scalarized instructions for a given unroll part and vector lane. If an instruction is uniform-after-vectorization we only generate values for Lane zero during scalarization, and getScalarValue asserts if we try to get a value for a Lane > 0.

However, as I'm thinking about this, I don't think we can ever end up with an instruction that requires predication that is also marked uniform-after-vectorization. Unless I'm missing something, we should probably just remove this if condition. What do you think?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:7099

-  VectorParts Cond;
-  if (IfPredicateInstr)
-    Cond = createBlockInMask(Instr->getParent());
+  // If the instruction requires predication, emit the edge masks for its
+  // predecessor blocks. The masks will be stored in MaskCache and made
----------------
gilr wrote:
> Why force the creation of the edge masks here?
We have to create the edge masks in program order because they are cached in MaskCache. PHI widening, for example, also calls createEdgeMask. During PHI widening, if a mask isn't already in the cache, we will produce it at that time. But then when we go back to perform the actual predication, we would reuse the masks we created for the PHIs. But these masks may not dominate the branches we create for the predicated blocks, so we have to produce the masks in order. The existing tests in if-pred-non-void.ll require this.

https://reviews.llvm.org/D26555