[PATCH] D27919: [Loop Vectorizer] Interleave vs Gather - in some cases Gather is better.

Mon Feb 6 17:18:16 PST 2017

mkuper added a comment.

The high-level structure of this looks good to me, thanks Elena!

Some minor/style comments inline.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:1930
+  enum InstWidening {
+    CM_DECISION_NONE,
+    CM_DECISION_WIDEN,
----------------
delena wrote:
> mssimpso wrote:
> > How are we using CM_DECISION_NONE? Aren't we forced to make some sort of decision? I would think the default would be CM_DECISION_WIDEN unless the cost model points to one of the others.
> I used it while capturing decisions for an interleave group. At this point I can get rid of it. I return "NONE" if no decision, and it is convenient.
>   InstWidening getWideningDecision(Instruction *I, unsigned VF) {
>       assert(VF >= 2 && "Expected VF >=2");
>       std::pair<Instruction *, unsigned> InstOnVF = std::make_pair(I, VF);
>       if (!WideningDecisions.count(InstOnVF))
>         return CM_DECISION_NONE;
>       return WideningDecisions[InstOnVF];
>     }
> Then I use it in assertion, to make sure that decision is made.
Maybe rename NONE to UNKNOWN, then? But I'm fine with None if you think that's clearer.
Also, I looked up the naming convention for enums in the coding standard, and I think it should be something like "CM_Widen, CM_Interleave", etc, not all caps.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:330

+static Type *getMemInstValueType(Value *I) {
+  assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
----------------
We probably already have several clones of those functions around the code-base.  And they are probably all slightly different. 
LSR has getAccessType(),  LAA has getAddressSpaceOperand(), LoadStoreVectorizer has getPointerAddressSpace(), and I'm sure there are more.
I don't want to make merging them a precondition of this patch, but can you please at least add a FIXME here?

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:1929
+    auto UniformsPerVF = Uniforms.find(VF);
+    return UniformsPerVF->second.count(I);
+  }
----------------
Why not just "return Uniforms[VF]->second.count(I);"? I don't think the verbosity helps here, and we don't actually care about the difference between find() and operator[] due to the assert.
Or, to save a lookup in an asserts build, you could find() and then assert on the result of the find().

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:1937
+    assert(Scalars.count(VF) && "Scalar values are not calculated for VF");
+    auto ScalarsPerVF = Scalars.find(VF);
+    return ScalarsPerVF->second.count(I);
----------------
Same as above.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:1988
+    std::pair<Instruction *, unsigned> InstOnVF = std::make_pair(I, VF);
+    if (!WideningDecisions.count(InstOnVF))
+      return CM_DECISION_NONE;
----------------
Here, on the other hand, I think find() would be better - that way you don't need two lookups.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:2120
+    if (VF == 1 || Uniforms.count(VF))
+      return;
+    setCostBasedWideningDecision(VF);
----------------
I'm still not sure I understand why this gets called twice for a user-provided VF. Could you explain again?

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:5512
   auto *Ptr = getPointerOperand(I);
   if (LI && isUniform(Ptr))
+    return false;
----------------
Do we still want this check here?
I mean:
1) An instruction with a uniform pointer *can* be widended.
2) That ties in with the way this is used - we check for the uniform case before the widening case, as we should. So, IIUC, we should never actually hit this.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7022
+      // Choose between Interleaving, Gather/Scatter or Scalarization.
+      unsigned InterleaveCost = (unsigned)(-1);
+      unsigned NumAccesses = 1;
----------------
UINT_MAX

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7036
+        if (InterleaveCost < NumAccesses * 2) {
+          // The interleaving cost is good enough.
+          setWideningDecision(Group, VF, CM_DECISION_INTERLEAVE,
----------------
Why the NumAccesses * 2 cut-off?

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7047
+              ? getGatherScatterCost(&I, VF) * NumAccesses
+              : (unsigned)(-1);
+
----------------
UINT_MAX

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7060
+        Cost = InterleaveCost;
+      } else if (GatherScatterCost < InterleaveCost &&
+                 GatherScatterCost < ScalarizationCost) {
----------------
Why do you need the  "GatherScatterCost < InterleaveCost" check here?

Repository:
  rL LLVM

https://reviews.llvm.org/D27919