[PATCH] D27919: [Loop Vectorizer] Interleave vs Gather - in some cases Gather is better.

Wed Feb 8 01:02:38 PST 2017

delena added inline comments.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:2120
+    if (VF == 1 || Uniforms.count(VF))
+      return;
+    setCostBasedWideningDecision(VF);
----------------
mkuper wrote:
> delena wrote:
> > mkuper wrote:
> > > I'm still not sure I understand why this gets called twice for a user-provided VF. Could you explain again?
> > We call this function from multiple places. From calculateRegisterUsage(), selectVectorizationFactor() - user-defined-VF and from expectedCost(). 
> > I just prevent recalculation.
> I understand, I'm just wondering whether we really need to do that.
> Anyway, it's not a new problem, we don't have to solve it here.
In the current version, before my changes,  we calculate Uniforms and Scalars once and do this in Legality. In this patch, I moved Uniforms and Scalars from Legality to the Cost Model and calculate them per VF. I did not find a single place to put the call, I'm calling the collectUniformsAndScalars() from multiple places. Doing that, I want to prevent the data recalculation.

I suppose that finding a right single place for calling collectUniformsAndScalars() is possible, but it will require additional movements in selectVectorizationFactor(). I think it can be done in a separate patch.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:5512
   auto *Ptr = getPointerOperand(I);
   if (LI && isUniform(Ptr))
+    return false;
----------------
mkuper wrote:
> mkuper wrote:
> > Do we still want this check here?
> > I mean:
> > 1) An instruction with a uniform pointer *can* be widended.
> > 2) That ties in with the way this is used - we check for the uniform case before the widening case, as we should. So, IIUC, we should never actually hit this.
> I think you may have missed this comment.
I'll fix.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7036
+        if (InterleaveCost < NumAccesses * 2) {
+          // The interleaving cost is good enough.
+          setWideningDecision(Group, VF, CM_DECISION_INTERLEAVE,
----------------
mssimpso wrote:
> mkuper wrote:
> > delena wrote:
> > > mkuper wrote:
> > > > Why the NumAccesses * 2 cut-off?
> > > I consider a cost per instruction. In this case InterleaveCost / NumAccesses == 1. (Matthew asked to avoid divisions).
> > > About 1 inst per access is good enough. I added more comments.
> > Well, this isn't really "about 1", this is "below 2".
> > I'd be more conservative here (InerelaveCost <= NumAccesses ? Can this even happen? Or are you trying to catch the cases where the ratio is ~1.1-1.2?). Or maybe remove this altogether. Is getGatherScatterCost() expensive in terms of compile time?
> The TTI estimates are supposed to be cheap to compute. I think it makes sense to remove this altogether in favor of greater simplicity.
ok.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7069
+        Cost = InterleaveCost;
+      } else if (GatherScatterCost < ScalarizationCost) {
+        Decision = CM_GatherScatter;
----------------
mkuper wrote:
> So, in case the costs are equal, you prefer scalarization to interleaving, and interleaving to scatter/gather?
> (In theory it shouldn't matter what happens when the costs are equal, just making sure this I understand.)
Matthew asked to change:
>>>"I think we're fairly consistent in other places where we compare costs, that we prefer the scalar version if there's no benefit for vectorization. So if the scalarization cost is <= the other costs, I think we should go with that."

Repository:
  rL LLVM

https://reviews.llvm.org/D27919