[PATCH] D51313: [LV] Fix code gen for conditionally executed uniform loads

Tue Aug 28 15:07:59 PDT 2018

Ayal added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5883
+          // gather/scatters are not considered scalar with predication.
+          !Legal->blockNeedsPredication(I.getParent())) {
         // Scalar load + broadcast
----------------
anna wrote:
> Ayal wrote:
> > Indeed scalarizing a conditional load from a uniform address is currently broken, when it can turn into a gather. The culprit is that VPReplicateRecipe, when constructed for such instructions, uses an incorrect IsPredicate = isScalarWithPredication(). Which, as noted, is false for loads and stores of types that can be gathered/scattered. I.e., really means mustBeScalarWithPredication. (That, on top of IsUniform which is also incorrect, causing replication as noted in D50665#1209958.)
> > 
> > This proposed fix to refrain from scalarizing such loads, should** make sure they always turn into a gather**, rather than rely on the cost-based gather-vs-scalarize decision below.
> > 
> > The above holds also for conditional loads from **non-uniform** addresses, that can turn into gathers, but possibly also get incorrectly scalarized w/o branches. It's hard to test, as the scalarization cost needs to be better than the gather for this to occur. But best make sure these also always turn into a gather.
> > 
> > It would also/alternatively be good to fix the scalarization of such loads (by correcting IsPredicate), and enable cost-based gather-vs-scalarize decision. Presumably gather should always win, unless perhaps if the load feed scalarized users.
> It seems to me that we need a new function `LoopVectorizationCostModel::isPredicated` which is a superset of instructions contained by `LoopVectorizationCostModel::isScalarWithPredication`? 
> So, the 'complete' fix here is to correct the costmodel as done above *and* fix `isPredicate` in `VPRecipeBuilder::handleReplication`:
> ```
> bool IsPredicated = CM.isPredicated(I);
> auto *Recipe = new VPReplicateRecipe(I, IsUniform, IsPredicated);
> ```
> 
> Essentially, whether we use a gather/scatter or we do plain old branch + load/store, (because we don't have gather/scatter support), both are predicated - one is a 'vector predicate', other is scalarized + predicated.
> 
As for a 'complete' fix...
The ten calls to isScalarWithPredication() don't all ask the same thing; it mostly depends on whether setWideningDecision() already took place or not; i.e., the decision if a load/store will be CM_Scalarize (with or w/o predication) or be a CM_GatherScatter. Those called after setWideningDecision() depend on the widening decision set, which depends on the VF, essentially asking if willBeScalarWithPredication(); i.e., excluding CM_GatherScatter. They are the majority, so best have them continue to use isScalarWithPredication(), passing it VF. Three are called before, essentially asking distinct things - here's a breakdown:

  # memInstCanBeWidened(): calls before the decision, in order to make it. Already knows it's a Load/Store. Needs to check if blockNeedsPredication && isMaskRequired && !isLegalMaskedLoad/Store. Note that isLegalMaskedGather/Scatter are irrelevant as they are not considered "Widened".

  # useEmulatedMaskMemRefHack(): calls once before the decision and again after it. But only to assert(isScalarWithPredication()), so need to carefully assert only what's valid in both contexts.

  # getMemInstScalarizationCost: checks what the cost would be if the Load/Store would be scalarized. Needs to check if blockNeedsPredication && isMaskRequired. Note that both isLegalMaskedLoad/Store and isLegalMaskedGather/Scatter are irrelevant.

Of the calls that take place after the decision, note that: 

  # setCostBasedWideningDecision: bumping NumPredStores++ should best be postponed to after the (CM_Scalarize) decision is reached, at the bottom.

  # handleReplication: should call isScalarWithPredication(I, VF) from within getDecisionAndClampRange(), as done with IsUniform.

  # tryToWiden: should call !isScalarWithPredication(I, VF) from within getDecisionAndClampRange().

Devising tests would be challenging, e.g., making gather/scatter decisions for some VF's yet scalarization for others.

Repository:
  rL LLVM

https://reviews.llvm.org/D51313