[PATCH] D27919: [Loop Vectorizer] Interleave vs Gather - in some cases Gather is better.

Tue Feb 7 09:27:10 PST 2017

mkuper added inline comments.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:1929
+    auto UniformsPerVF = Uniforms.find(VF);
+    return UniformsPerVF->second.count(I);
+  }
----------------
delena wrote:
> mkuper wrote:
> > Why not just "return Uniforms[VF]->second.count(I);"? I don't think the verbosity helps here, and we don't actually care about the difference between find() and operator[] due to the assert.
> > Or, to save a lookup in an asserts build, you could find() and then assert on the result of the find().
> The "const" qualifier does not allow the Uniforms[VF] form.
Ohh, right, missed it's const, sorry.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:2120
+    if (VF == 1 || Uniforms.count(VF))
+      return;
+    setCostBasedWideningDecision(VF);
----------------
delena wrote:
> mkuper wrote:
> > I'm still not sure I understand why this gets called twice for a user-provided VF. Could you explain again?
> We call this function from multiple places. From calculateRegisterUsage(), selectVectorizationFactor() - user-defined-VF and from expectedCost(). 
> I just prevent recalculation.
I understand, I'm just wondering whether we really need to do that.
Anyway, it's not a new problem, we don't have to solve it here.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:5512
   auto *Ptr = getPointerOperand(I);
   if (LI && isUniform(Ptr))
+    return false;
----------------
mkuper wrote:
> Do we still want this check here?
> I mean:
> 1) An instruction with a uniform pointer *can* be widended.
> 2) That ties in with the way this is used - we check for the uniform case before the widening case, as we should. So, IIUC, we should never actually hit this.
I think you may have missed this comment.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7036
+        if (InterleaveCost < NumAccesses * 2) {
+          // The interleaving cost is good enough.
+          setWideningDecision(Group, VF, CM_DECISION_INTERLEAVE,
----------------
delena wrote:
> mkuper wrote:
> > Why the NumAccesses * 2 cut-off?
> I consider a cost per instruction. In this case InterleaveCost / NumAccesses == 1. (Matthew asked to avoid divisions).
> About 1 inst per access is good enough. I added more comments.
Well, this isn't really "about 1", this is "below 2".
I'd be more conservative here (InerelaveCost <= NumAccesses ? Can this even happen? Or are you trying to catch the cases where the ratio is ~1.1-1.2?). Or maybe remove this altogether. Is getGatherScatterCost() expensive in terms of compile time?

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7069
+        Cost = InterleaveCost;
+      } else if (GatherScatterCost < ScalarizationCost) {
+        Decision = CM_GatherScatter;
----------------
So, in case the costs are equal, you prefer scalarization to interleaving, and interleaving to scatter/gather?
(In theory it shouldn't matter what happens when the costs are equal, just making sure this I understand.)

Repository:
  rL LLVM

https://reviews.llvm.org/D27919