[PATCH] D27690: [LV] Don't vectorize when we have a static bound on trip count
Hal Finkel via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 13 11:27:38 PST 2016
----- Original Message -----
> From: "Michael Kuperstein" <mkuper at google.com>
> To: reviews+D27690+public+134fce31590ae349 at reviews.llvm.org
> Cc: "Gil Rapaport" <gil.rapaport at intel.com>, "Matthew Simpson"
> <mssimpso at codeaurora.org>, "Hal Finkel" <hfinkel at anl.gov>, "Mikhail
> Zolotukhin" <mzolotukhin at apple.com>, "llvm-commits"
> <llvm-commits at lists.llvm.org>
> Sent: Tuesday, December 13, 2016 1:17:42 PM
> Subject: Re: [PATCH] D27690: [LV] Don't vectorize when we have a
> static bound on trip count
> > > b) Remainder loops for hand-vectorized code. These will also not
> > > be
> > > unrolled - the trip-count is unknown, and doesn't have a known
> > > multiple. (We may end up with runtime unrolling and yet another
> > > "remainder loop", which doesn't really improve things.) And, of
> > > course, it's almost always a bad idea to vectorize these. (The
> > > exception may be something like hand-vectorization by 16, with a
> > > scalar remainder loop. We may want to vectorize that remainder by
> > > 4 and leave a smaller scalar remainder, but that sounds like a
> > > very small win.)
> > I agree, but I think we're going about this the wrong way. The cost
> > of the branching and runtime checks need to be factored into the
> > cost model (which will be relevant for low-trip-count loops), and
> > that should naturally prevent this kind of messiness. Just not
> > vectorizing low-trip-count loops is suboptimial because it will
> > miss
> > cases where vectorization is quite profitable.
> You're completely right, but this isn't new - it's just that it's
> being applied non-uniformly, depending on what exactly we know about
> the trip count. That is, we do it "the wrong way" for loops with a
> known exact trip count, and don't do it at all with loop with a
> known upper bound.
I apologize if I implied that this was a new problem being introduced by this patch. I certainly agree that it is not new.
> I want us to start treating all three cases (static exact, static
> bound, dynamic) in the same way, by using the "right" number for the
> trip-count. Using this number in a smarter way (by estimating the
> overhead cost, and then dividing it by the trip-count to get the
> per-iteration cost*) is, I think, orthogonal to actually getting the
> number right.
So long as that's the plan, then I'm fine with this. I agree that the uniformity is desirable, at least for now. There is still a difference in modeling the exact case vs. the unknown case, however, because the unknown case has a remainder loop and, thus, at least an extra compare/branch somewhere. I assume we'll want to account for this in the cost modeling.
> * Well, almost. For a loop with 7 iterations that we vectorize by 4,
> we aren't really spreading the cost among "1.75" vectorized
> iterations, but just the one. This is negligible for high trip
> counts, but the whole point is to evaluate it correctly for the low
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits