[llvm-dev] [Proposal][RFC] Epilog loop vectorization
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Mon Feb 27 15:35:33 PST 2017
On 02/27/2017 04:19 PM, Zaks, Ayal wrote:
>
> On 02/27/2017 12:41 PM, Michael Kuperstein wrote:
>
> There's another issue with re-running the vectorizer (which I
> support, btw - I'm just saying there are more problems to solve on
> the way :-) )
>
> Historically, we haven't even tried to evaluate the cost of the
> "constant" (not per-iteration) vectorization overhead - things
> like alias checks. Instead, we have hard bounds - we won't perform
> alias checks that are "too expensive", and, more importantly, we
> don't even try to vectorize loops with known low iteration counts.
> The bound right now is 16, IIRC. That means we don't have a good
> way to evaluate whether vectorizing a loop with a low iteration
> count is profitable or not.
>
>
> We should really improve this as well.
>
> @Michael: OTOH, we should reach the same decision again (i.e., that of
> performing the alias checks) when encountering the remainder loop as
> we did with the original loop, given that hard bounds are used ;-).
>
> But agreed, it is better to evaluate the cost of these bounds along
> with the overall estimated cost instead.
>
> This also makes me wary of the "we can clean up redundant alias
> checks later" approach. When trying to decide whether to vectorize
> by 4 a loop that has no more than 8 iterations (because we just
> vectorized by 8 and it's the remainder loop), we really want to
> know if the alias checks we're introducing are going to survive a not.
>
>
> It occurs to me that, if SCEV's known-predicate logic were smart
> enough, it would seem practical to not introduce redundant checks in
> the first place (although it would imply some gymnastics when
> examining the control flow around the loop and then restructuring
> things when we generate the code for the loop).
>
> The scalar remainder loop, when reached from the vectorized loop, is
> already known to be vectorizable to a VF larger than EpilogVF.
>
I was not under the impression we had a remainder loop separate from the
loop used for scalar computation. Don't we use the same loop in cases
where the vectorization is not legal?
-Hal
> No need to introduce again any potential aliasing, wrapping or whatnot
> checks, even if this redundancy can later be eliminated, if instead
> this vectorizability property could be recorded somehow. Similar to
> having annotated the remainder loop with “#pragma clang loop
> vectorize(assume_safety)”, except that this vectorizability property
> does not hold when reaching the remainder loop along the other path –
> that which fails these checks for the main loop...
>
> Ayal.
>
>
> -Hal
>
>
> Michael
>
> On Mon, Feb 27, 2017 at 10:11 AM, Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>> wrote:
>
> On 02/27/2017 11:47 AM, Adam Nemet wrote:
>
> On Feb 27, 2017, at 9:39 AM, Daniel Berlin
> <dberlin at dberlin.org <mailto:dberlin at dberlin.org>> wrote:
>
> On Mon, Feb 27, 2017 at 9:29 AM, Adam Nemet
> <anemet at apple.com <mailto:anemet at apple.com>> wrote:
>
> On Feb 27, 2017, at 7:27 AM, Hal Finkel
> <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>
>
> On 02/27/2017 06:29 AM, Nema, Ashutosh wrote:
>
> Thanks for looking into this.
>
> 1) Issues with re running vectorizer:
>
> Vectorizer might generate redundant alias
> checks while vectorizing epilog loop.
>
> Redundant alias checks are expensive, we
> like to reuse the results of already
> computed alias checks.
>
> With metadata we can limit the width of
> epilog loop, but not sure about reusing
> alias check result.
>
> Any thoughts on rerunning vectorizer with
> reusing the alias check result ?
>
>
> One way of looking at this is: Reusing the
> alias-check result is really just a
> conditional propagation problem; if we don't
> already have an optimization that can combine
> these after the fact, then we should.
>
> +Danny
>
> Isn’t Extended SSA supposed to help with this?
>
> Yes, it will solve this with no issue already. GVN
> probably does already too.
>
> even if if you have
>
> if (a == b)
>
> if (a == c)
>
> if (a == d)
>
> if (a == e)
>
> if (a == g)
>
> and we can prove a ... g equivalent, newgvn will
> eliminate them all and set all the branches true.
>
> If you need a simpler clean up pass, we could run it
> on sub-graphs.
>
> Yes we probably don’t want to run a full GVN after the
> “loop-scheduling” passes.
>
>
> FWIW, we could, just without the memory-dependence analysis
> enabled (i.e. set the NoLoads constructor parameter to true).
> GVN is pretty fast in that mode.
>
> -Hal
>
>
> I guess the pipeline to experiment with for now is opt
> -loop-vectorize -loop-vectorize -newgvn.
>
> Adam
>
>
>
> The only thing you'd have to do is write some code to
> set "live on entry" subgraph variables in their own
> congruence classes.
>
> We already do this for incoming arguments.
>
> Otherwise, it's trivial to make it only walk things in
> the subgraph.
>
>
>
> --
>
> Hal Finkel
>
> Lead, Compiler Technology and Programming Languages
>
> Leadership Computing Facility
>
> Argonne National Laboratory
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170227/9d95f876/attachment.html>
More information about the llvm-dev
mailing list