[llvm-dev] [Proposal][RFC] Epilog loop vectorization

Wed Mar 15 10:38:24 PDT 2017

On 03/15/2017 05:55 AM, Nema, Ashutosh wrote:
>
> *From:*Zaks, Ayal [mailto:ayal.zaks at intel.com]
> *Sent:* Wednesday, March 15, 2017 4:39 AM
> *To:* Nema, Ashutosh <Ashutosh.Nema at amd.com>; anemet at apple.com; Hal 
> Finkel <hfinkel at anl.gov>; Renato Golin <renato.golin at linaro.org>; 
> mkuper at google.com; Mehdi Amini <mehdi.amini at apple.com>; Daniel Berlin 
> <dberlin at dberlin.org>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* RE: [llvm-dev] [Proposal][RFC] Epilog loop vectorization
>
> *From:*Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com]
>
> Summarizing the discussion on the implementation approaches.
>
> Discussed about two approaches, first running ‘InnerLoopVectorizer’ 
> again on the epilog loop immediately after vectorizing the original 
> loop within the same vectorization pass, the second approach where 
> re-running vectorization pass and limiting vectorization factor of 
> epilog loop by metadata.
>
> <Approach-2>
>
> Challenges with re-running the vectorizer pass:
>
> 1)Reusing alias check result:
>
> When vectorizer pass runs again it finds the epilog loop as a new loop 
> and it may generates alias check, this new alias check may overkill 
> the gains of epilog vectorization.
>
> We should use the already computed alias check result instead of re 
> computing again.
>
> Right, can this challenge be addressed – can we record the “simple” 
> fact that the epilog loop is vectorizable with trip count at-most 
> VF*UF when reached from the vectorized loop? This is akin to passing 
> similar information from the front-end when supplied by, e.g., OpenMP 
> pragmas, with the additional path-sensitive context attached.
>
> I did not get this point completely. Yes, we can record the maximum 
> width for epilog vectorization but what you meant by “path-sensitive 
> context attached”.
>
> Please elaborate more on this and how does it help in reusing alias 
> check result ?
>
> Agreed, if each loop is handled independently, the multiple 
> minimum-trip-count tests should be revisited to optimize for smallest 
> trip-count first.
>
> If the main loop was vectorized by VF and unrolled by UF>1, it may be 
> reasonable to vectorize the remainder loop with the same VF (w/o 
> unrolling).
>
> And then possibly vectorize the remainder of that with a smaller, say, 
> VF/2. In addition, situations having small types and large vectors may 
> result in large VF, again leaving room for possibly repeated epilog 
> vectorizations with decreasing VF’s. At some point it would be good to 
> try the alternative of a (final) masked vector epilog.
>
> Each vector version incurs extra cost by adding extra checks, 
> considering this fact I have limit the patch to only generate one 
> epilog vector version.
>
> We can generate multiple epilog versions but we have to understand the 
> tradeoff of generating them. Once we have the proper costing of checks 
> we can make more precise decisions. I like to defer this for later 
> enhancements. '
>

If we model the costs of the extra checks and branches, then we can ask: 
Will the savings from executing even one iteration of the vectorized 
epilogue loop be greater than the cost of the checks. For really small 
loops, this might not be obvious?

  -Hal

> Masked instructions are available is AVX512 and of course it’s better 
> solution then this. But architectures which does not have masked 
> instruction support epilog vector version is one of the technique to 
> vectorize epilog iterations.
>
> Ayal.
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170315/1087d3a0/attachment-0001.html>