[llvm-dev] [Proposal][RFC] Epilog loop vectorization

Tue Mar 14 10:43:02 PDT 2017

On 03/14/2017 09:00 AM, Hal Finkel via llvm-dev wrote:
>
>
> On 03/14/2017 08:00 AM, Nema, Ashutosh wrote:
>>
>> Summarizing the discussion on the implementation approaches.
>>
>> Discussed about two approaches, first running ‘InnerLoopVectorizer’ 
>> again on the epilog loop immediately after vectorizing the original 
>> loop within the same vectorization pass, the second approach where 
>> re-running vectorization pass and limiting vectorization factor of 
>> epilog loop by metadata.
>>
>> <Approach-2>
>>
>> Challenges with re-running the vectorizer pass:
>>
>> 1)Reusing alias check result:
>>
>> When vectorizer pass runs again it finds the epilog loop as a new 
>> loop and it may generates alias check, this new alias check may 
>> overkill the gains of epilog vectorization.
>>
>> We should use the already computed alias check result instead of re 
>> computing again.
>>
>> 2)Rerun the vectorizer and hoist the new alias check:
>>
>> It’s not possible to hoist alias checks as its not fully redundant 
>> (not dominated by other checks), it’s not getting execute in all paths.
>>
>> NOTE: We cannot prepone alias check as its expensive compared to 
>> other checks.
>>
>> <Approach-1>
>>
>> 1)Current patch depends on the existing functionality of 
>> LoopVectorizer, it uses ‘InnerLoopVectorizer’ again to vectorize the 
>> epilog loop, as it happens in the same vectorization pass we have 
>> flexibility to reuse already computed alias result check & limit 
>> vectorization factor for the epilog loop.
>>
>> 2)It does not generate the blocks for new block layout explicitly, 
>> rather it depends on ‘InnerLoopVectorizer::createEmptyLoop’ to 
>> generate new block layout. The new block layout get automatically 
>> generated by calling the ‘InnerLoopVectorizer:: vectorize’ again.
>>
>> 3)Block layout description with epilog loop vectorization is available at
>>
>> https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png
>>
>> Approach-1 looks feasible, please comment if any objections.
>>
>
> I think think this is reasonable. One thing: In the proposed block 
> layout, if the alias check fails, we jump to the  "Min Iter Check 2". 
> From there we re-check the alias-check result (which will be false 
> again), and then jump to the scalar loop. This is one more branch than 
> necessary in the case where the alias check fails. If the alias check 
> fails, we should jump directly to the scalar loop.

There's another issue as well. If the trip count is small, it is 
important that the critical path through the checks to the scalar loop 
is as small as possible. If we use this layout, then in the case where 
the trip count is very small, we've now introduced an extra check (or 
set of checks) to get to the scalar loop. We need to do it the other 
way: Check the smaller trip count first. If that fails, go to the scalar 
loop. Only if the small trip count succeeds, then we check the larger 
trip count. The path length through the trip counts must be largest when 
we have the most work over which to amortize the checks (i.e. when the 
trip count is largest).

  -Hal

>
> Thanks again,
> Hal
>
>> ...

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170314/991a2993/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 18444 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170314/991a2993/attachment.jpe>