[llvm] r176399 - PR14448 - prevent the loop vectorizer from vectorizing the same loop twice.

Tue Mar 5 10:42:34 PST 2013

On Mar 3, 2013, at 12:17 AM, Nadav Rotem <nrotem at apple.com> wrote:

> Hi Chandler,
> 
> Maybe I did not explain the problem well.  After vectorization we get this loop structure:
> 
> [VECTOR LOOP]  
>     |
>     v
> [SCALAR POST  LOOP]
> 
> The scalar post loop executes the last few iterations, or the entire loop if the runtime checks fail. The scalar loop is still vectorizable, because it is the original loop.  
> 
> After running the loop vectorizer again we will get this structure:
> 
> [VECTOR LOOP]
>    |
>    v
> [VECTOR LOOP II]
>   |
>   v
> [SACLAR LOOP] 
> 
> The second vector loop is dead code because it has all of the runtime checks as the first loop. The program is still correct. 
> 
> 
>> The very premise of metadata is that removing it is the safe alternative,
> 
> Yes, and removing the metadata will not change the correctness of the code, just generate larger code. 
> 
>> and you've designed this in such a way that removing metadata is exactly the thing which other optimizations cannot do.
> 
> No. If other optimizations remove the metadata then the scalar loop will be re-vectorized, and the bypass loop will skip the second vector loop. This is still correct but suboptimal because it will generate larger code. 
> 
>> Why isn't the solution to "The LoopVectorizer often runs multiple times on the same function due to inlining" simply to not include the loop vectorizer in the CGSCC pass manager that has this behavior, and instead run it in a late stage of the pass manager, once for each function?
> 
> Yes. At the moment we run all of our passes in a single pass manager, and until we split the pass manager to two phases (canonicalization + lowering) we are going to have this problem.  We are working on a solution to this problem, but it will take some time. 
> 
>> 
>> Fundamentally, there seem to be two options for how to design the loop vectorizer:
>> 
> 
> LTDR:  This workaround is needed to prevent code bloat. We plan to split the pass manager in the near future. 

Yes, hopefully reorganizing the passes will solve this eventually. Just for the sake of discussion, another option is inserting branch probability metadata on the scalar loop backedge indicating that it iterates an expected vector-width/2 times. And teach the vectorizer to skip low trip count loops.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130305/7156da2a/attachment.html>