[PATCH] D42946: Verify profile data confirms large loop trip counts.

Tue Feb 6 00:29:52 PST 2018

mtrofin added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:8353

-  if (!HasExpectedTC && LoopVectorizeWithBlockFrequency) {
+  // ExpectedTC may be large because it's bound by a variable. Check
+  // profiling information to validate we should vectorize.
----------------
davidxl wrote:
> mtrofin wrote:
> > davidxl wrote:
> > > What is the ExpectedTC returned in this case?  Why does it not return CouldNotCompute  (or 0)?
> > For example, for:
> > 
> > ```
> > for (unsigned i = 1; i < something; ++i) {
> >   ...
> > }
> > ```
> > 
> > it returns 0xffff fffe - assuming `unsigned something`. That's correct - that's the maximum times the loop body would be executed, because `something` might at most be 0xffff ffff. If the step were 2 instead of 1, the ExpectedTC would reflect that accordingly (meaning, the trip count would be half). 
> > In contrast, CouldNotCompute is produced in cases like unsupported loops, for instance those with multiple exits.
> > 
> > When it comes to '0', the API behaves a bit unexpectedly, I'd argue: in cases like the ones in the current set of unit tests, it ends up adding 1 to the max taken back edges (which is 0xffff ffff there), which through overflow gets us to 0. That was the reason those tests were passing.
> > 
> > I'd love to hear others' thoughts are on this overflow behavior, and whether there's any reason we shouldn't refactor it, because relying on the overflow to represent "not having an expected trip count" falls through in cases like I just presented (e.g. non-unitary step, starting from something else than 0, etc)
> > 
> > This issue of determining "no expected trip counts" aside, I'd argue the issue addressed here can be addressed separately.
> I think the logic here should implement the order of precedence in this way:
> 
> 1) If there is known constant trip count (such as for( ...; i < 1000; i++), use that as the expectedTC
> 2) otherwise if there is profile based estimated trip count, use it as the expectedTC
> 3) use the smallConstantMaxTripCount as an estimation.
That would make it more clear, indeed.

Repository:
  rL LLVM

https://reviews.llvm.org/D42946