[llvm-dev] [Proposal][RFC] Epilog loop vectorization
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Wed Mar 15 10:38:24 PDT 2017
On 03/15/2017 05:55 AM, Nema, Ashutosh wrote:
>
> *From:*Zaks, Ayal [mailto:ayal.zaks at intel.com]
> *Sent:* Wednesday, March 15, 2017 4:39 AM
> *To:* Nema, Ashutosh <Ashutosh.Nema at amd.com>; anemet at apple.com; Hal
> Finkel <hfinkel at anl.gov>; Renato Golin <renato.golin at linaro.org>;
> mkuper at google.com; Mehdi Amini <mehdi.amini at apple.com>; Daniel Berlin
> <dberlin at dberlin.org>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* RE: [llvm-dev] [Proposal][RFC] Epilog loop vectorization
>
> *From:*Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com]
>
> Summarizing the discussion on the implementation approaches.
>
> Discussed about two approaches, first running ‘InnerLoopVectorizer’
> again on the epilog loop immediately after vectorizing the original
> loop within the same vectorization pass, the second approach where
> re-running vectorization pass and limiting vectorization factor of
> epilog loop by metadata.
>
> <Approach-2>
>
> Challenges with re-running the vectorizer pass:
>
> 1)Reusing alias check result:
>
> When vectorizer pass runs again it finds the epilog loop as a new loop
> and it may generates alias check, this new alias check may overkill
> the gains of epilog vectorization.
>
> We should use the already computed alias check result instead of re
> computing again.
>
> Right, can this challenge be addressed – can we record the “simple”
> fact that the epilog loop is vectorizable with trip count at-most
> VF*UF when reached from the vectorized loop? This is akin to passing
> similar information from the front-end when supplied by, e.g., OpenMP
> pragmas, with the additional path-sensitive context attached.
>
> I did not get this point completely. Yes, we can record the maximum
> width for epilog vectorization but what you meant by “path-sensitive
> context attached”.
>
> Please elaborate more on this and how does it help in reusing alias
> check result ?
>
> Agreed, if each loop is handled independently, the multiple
> minimum-trip-count tests should be revisited to optimize for smallest
> trip-count first.
>
> If the main loop was vectorized by VF and unrolled by UF>1, it may be
> reasonable to vectorize the remainder loop with the same VF (w/o
> unrolling).
>
> And then possibly vectorize the remainder of that with a smaller, say,
> VF/2. In addition, situations having small types and large vectors may
> result in large VF, again leaving room for possibly repeated epilog
> vectorizations with decreasing VF’s. At some point it would be good to
> try the alternative of a (final) masked vector epilog.
>
> Each vector version incurs extra cost by adding extra checks,
> considering this fact I have limit the patch to only generate one
> epilog vector version.
>
> We can generate multiple epilog versions but we have to understand the
> tradeoff of generating them. Once we have the proper costing of checks
> we can make more precise decisions. I like to defer this for later
> enhancements. '
>
If we model the costs of the extra checks and branches, then we can ask:
Will the savings from executing even one iteration of the vectorized
epilogue loop be greater than the cost of the checks. For really small
loops, this might not be obvious?
-Hal
> Masked instructions are available is AVX512 and of course it’s better
> solution then this. But architectures which does not have masked
> instruction support epilog vector version is one of the technique to
> vectorize epilog iterations.
>
> Ayal.
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170315/1087d3a0/attachment-0001.html>
More information about the llvm-dev
mailing list