[PATCH] D27690: [LV] Don't vectorize when we have a static bound on trip count

Tue Dec 13 12:07:02 PST 2016

----- Original Message -----

> From: "Michael Kuperstein" <mkuper at google.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Gil Rapaport" <gil.rapaport at intel.com>, "Matthew Simpson"
> <mssimpso at codeaurora.org>, "Mikhail Zolotukhin"
> <mzolotukhin at apple.com>, "llvm-commits"
> <llvm-commits at lists.llvm.org>,
> reviews+D27690+public+134fce31590ae349 at reviews.llvm.org
> Sent: Tuesday, December 13, 2016 1:42:10 PM
> Subject: Re: [PATCH] D27690: [LV] Don't vectorize when we have a
> static bound on trip count

> On Tue, Dec 13, 2016 at 11:27 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:

> > > From: "Michael Kuperstein" < mkuper at google.com >
> > 
> 
> > > To: reviews+D27690+public+134fce31590ae349 at reviews.llvm.org
> > 
> 
> > > Cc: "Gil Rapaport" < gil.rapaport at intel.com >, "Matthew Simpson"
> > > <
> > > mssimpso at codeaurora.org >, "Hal Finkel" < hfinkel at anl.gov >,
> > > "Mikhail Zolotukhin" < mzolotukhin at apple.com >, "llvm-commits" <
> > > llvm-commits at lists.llvm.org >
> > 
> 
> > > Sent: Tuesday, December 13, 2016 1:17:42 PM
> > 
> 
> > > Subject: Re: [PATCH] D27690: [LV] Don't vectorize when we have a
> > > static bound on trip count
> > 
> 

> > > > > b) Remainder loops for hand-vectorized code. These will also
> > > > > not
> > > > > be
> > > > > unrolled - the trip-count is unknown, and doesn't have a
> > > > > known
> > > > > multiple. (We may end up with runtime unrolling and yet
> > > > > another
> > > > > "remainder loop", which doesn't really improve things.) And,
> > > > > of
> > > > > course, it's almost always a bad idea to vectorize these.
> > > > > (The
> > > > > exception may be something like hand-vectorization by 16,
> > > > > with
> > > > > a
> > > > > scalar remainder loop. We may want to vectorize that
> > > > > remainder
> > > > > by
> > > > > 4 and leave a smaller scalar remainder, but that sounds like
> > > > > a
> > > > > very small win.)
> > > 
> > 
> 

> > > > I agree, but I think we're going about this the wrong way. The
> > > > cost
> > > > of the branching and runtime checks need to be factored into
> > > > the
> > > > cost model (which will be relevant for low-trip-count loops),
> > > > and
> > > > that should naturally prevent this kind of messiness. Just not
> > > > vectorizing low-trip-count loops is suboptimial because it will
> > > > miss
> > > > cases where vectorization is quite profitable.
> > > 
> > 
> 

> > > You're completely right, but this isn't new - it's just that it's
> > > being applied non-uniformly, depending on what exactly we know
> > > about
> > > the trip count. That is, we do it "the wrong way" for loops with
> > > a
> > > known exact trip count, and don't do it at all with loop with a
> > > known upper bound.
> > 
> 
> > I apologize if I implied that this was a new problem being
> > introduced
> > by this patch. I certainly agree that it is not new.
> 

> > > I want us to start treating all three cases (static exact, static
> > > bound, dynamic) in the same way, by using the "right" number for
> > > the
> > > trip-count. Using this number in a smarter way (by estimating the
> > > overhead cost, and then dividing it by the trip-count to get the
> > > per-iteration cost*) is, I think, orthogonal to actually getting
> > > the
> > > number right.
> > 
> 
> > So long as that's the plan, then I'm fine with this. I agree that
> > the
> > uniformity is desirable, at least for now. There is still a
> > difference in modeling the exact case vs. the unknown case,
> > however,
> > because the unknown case has a remainder loop and, thus, at least
> > an
> > extra compare/branch somewhere. I assume we'll want to account for
> > this in the cost modeling.
> 

> I don't want to create any false expectations - it should be fixed,
> but I don't have any short-term plans to fix it.
Understood. 

> It's the right thing to do, but it looks hard to get right without
> introducing regressions, and I haven't run into practical cases
> where we're losing performance by not vectorizing a short loop that
> would justify doing it. I'm sure these cases exist, and it's trivial
> to construct one, but I just haven't encountered them where it
> matters. :-)

> (Maybe it's just because missed opportunities are less obvious then
> the cases where we vectorize things we shouldn't...)

I have. Users can add pragmas, etc. but we should do a better job here. 

> As to exact vs. unknown - I'm not sure what you mean. The exact case
> may also have a remainder loop, if it's not divisible by VF.
I should have been more specific. I meant cases where we know a remainder loop is not needed vs. the cases where we need one. 

Thanks again, 
Hal 

> > -Hal
> 

> > > * Well, almost. For a loop with 7 iterations that we vectorize by
> > > 4,
> > > we aren't really spreading the cost among "1.75" vectorized
> > > iterations, but just the one. This is negligible for high trip
> > > counts, but the whole point is to evaluate it correctly for the
> > > low
> > > case.
> > 
> 
> > --
> 

> > Hal Finkel
> 
> > Lead, Compiler Technology and Programming Languages
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 

-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161213/ff24893f/attachment.html>