[PATCH] D51313: [LV] Fix code gen for conditionally executed uniform loads

Wed Sep 5 15:35:28 PDT 2018

hsaito added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6826
     if (isa<LoadInst>(I) || isa<StoreInst>(I)) {
-      assert(CM.getWideningDecision(I, VF) ==
-                 LoopVectorizationCostModel::CM_Scalarize &&
+      assert((CM.getWideningDecision(I, VF) !=
+                  LoopVectorizationCostModel::CM_Widen &&
----------------
anna wrote:
> hsaito wrote:
> > Same as Line 4404.
> I believe it does not apply here. We specifically *need* to check for widening decisions. At this point, the decision can be CM_GATHERSCATTER or a widening decision (if we hadn't called `tryToWidenMemory` and fail this assert). We will never get CM_Scalarize here because of the check at line 6747.
> 
> So, I think this assert is pretty clear in that aspect.
In non-NativeVPlan mode, tryToWidenMemory() is the only place that creates VPWidenMemoryRecipe(), which handles Widen, Widen_Reverse, and GatherScatter. If we intend to emit vector loads/stores or gathers/scatters, that needs to be caught there.

Line 6747 filters out only the predicated scalars. So, unmasked scalar loads/stores would hit here.

So, in my opinion, if CM_GatherScatter hits here, that means "cost model wants to use gather/scatter but we'll serialize". That is an error condition, isn't it?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5754
   // block.
-  if (isScalarWithPredication(I)) {
+  if (isPredicatedInst(I)) {
     Cost /= getReciprocalPredBlockProb();
----------------
Just a note. No actionable items here.

Where the code modification matters, the code before the change ends up comparing masked gather/scatter (we shouldn't see masked widened load/store in the called context) against unmasked scalarized load/store. So this change makes sense. As a result, the modified assert condition in useEmulatedMaskMemRefHack() makes sense.

Just be aware that we may end up losing some vectorization as a result, but those are likely to be the places where we are generating wrong code today: masked gather/scatter is available but cost model decides to scalarize. This scalarized code isn't properly masked --- because masking there is gated by masked gather/scatter being unavailable.

Repository:
  rL LLVM

https://reviews.llvm.org/D51313