[PATCH] D37702: [LV] Clamp the VF to the trip count

Tue Sep 12 06:40:56 PDT 2017

Ayal added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6179
   // If we optimize the program for size, avoid creating the tail loop.
-  unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
   DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');

----------------
anna wrote:
> Ayal wrote:
> > Probably good to hoist this DEBUG line along with computing TC earlier.
> When `OptForSize` is true (i.e. at this point in the code), we know for sure that a small trip count value returns a constant trip count or value of 1, if we didn't know the trip count.
> 
> I left this here rather than move up because there maybe *more* spurious cases where we do not have a *small* trip count, and the DEBUG statement wouldn't be useful.
At this point I think TC can also be zero if we don't know it, which is why we check if TC<2 below (i.e., if (TC==0 || TC == 1)).

Printing out the value of TC when computed above can only aid debugging, afaics.

In any case, I'm also ok leaving it here; you're also printing TC in computeFeasibleMaxVF() where it clamps MaxVF.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6258
     SmallVector<unsigned, 8> VFs;
     unsigned NewMaxVectorSize = WidestRegister / SmallestType;
     for (unsigned VS = MaxVectorSize; VS <= NewMaxVectorSize; VS *= 2)
----------------
anna wrote:
> Ayal wrote:
> > While we're at it, in this or a follow-up patch:
> >   - Suggest to also clamp NewMaxVectorSize above to TC.
> >   - VS below should start from MaxVectorSize*2 instead of MaxVectorSize; it's pointless and wasteful to calculateRegisterUsage for MaxVectorSize, as the latter will be returned by default anyway.
> >   - Optimize calculateRegisterUsage() for the case where VFs is empty.
> > 
> > Doing this will also take care of your early exit above.
> To me, it seems rather than clamping the `NewMaxVectorSize`, explicitly showing the early return is clearer. 
> The second and third suggestions I'll do in a follow-up patch. 
> 
> Also, an additional point to follow up: I think we can early return in even earlier case -  when VF is set to 1 because the target has no vector registers. 
Agreed!

================
Comment at: test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll:54
+; CHECK-LABEL: not_too_small_tc
+; CHECK-AVX1: LV: Selecting VF: 16.
+; CHECK-AVX2: LV: Selecting VF: 16.
----------------
The max possible vector width for this test on AVX1 is 16, so we're missing the point in checking its selected VF here. Suffice to CHECK-AVX2 only, whose max possible vector width is 32, as noted.

https://reviews.llvm.org/D37702