[LLVMdev] [Polly] Move Polly's execution later

Star Tan tanmx_star at yeah.net
Tue Sep 24 19:55:28 PDT 2013


Here is an update about moving Polly later.

1. Why does Polly generate incorrect code when we move Polly immediately after the loop rotating pass?

It is mainly caused by a wrong polly merge block. When Polly detects a valid loop for Polyhedral transformations, it usually introduces a new basic block "polly.merge_new_and_old" after the original loop exit block. This new basic block contains a branch instruction that decides whether to exit or re-enter the loop like this:

polly.loop_exit28:                                ; preds = %polly.stmt.for.body.i, %polly.loop_if25
  br label %polly.merge_new_and_old
polly.merge_new_and_old:                          ; preds = %polly.loop_exit28, %for.body.i
  ...
  %cmp = icmp slt i32 %2, 0
  ...
  br i1 %cmp, label %for.body, label %for.end12
for.end12:                                        ; preds = %loop.exit
  ...
  ret i32 0
  
Unfortunately, the new basic block "polly.merge_new_and_old" is marked as unreachable after a serial of immediate Loop optimizations in the following pass order:
          Polly - Create LLVM-IR from SCoPs
        Loop Pass Manager
          Canonicalize natural loops
          Loop-Closed SSA Form Pass               
          Rotate Loops
          Loop Invariant Code Motion
          Loop-Closed SSA Form Pass
          Unswitch loops
        Combine redundant instructions

After those loop optimition passes, the "Combine redundant instructions" would treat the basic block "merge_new_and_old" unreachable and transfer the block "polly.loop_exit28" into an infinity loop:
polly.loop_exit28:                                ; preds = %polly.stmt.for.body.i, %polly.loop_if25
  br label polly.loop_exit28

Actually, I have no idea why this happens, but experiments show that this problem can be addressed by adding a pass "MPM.add(createCFGSimplificationPass())" after Polly. It seems it is necessary to run this pass after Polly code generation.

2. Where should we  move Polly to?

There are many choices to move Polly later. Tobias suggests to move it immediately after the loop rotate pass (the first loop optimization pass), but unfortunately Polly would generate incorrect code in that case (http://llvm.org/bugs/show_bug.cgi?id=17323).

By investigating those Polly canonicalization passes (polly/lib/RegisterPasses.cpp):
    PM.add(llvm::createPromoteMemoryToRegisterPass());
    PM.add(llvm::createInstructionCombiningPass());
    PM.add(llvm::createCFGSimplificationPass());
    PM.add(llvm::createTailCallEliminationPass());
    PM.add(llvm::createCFGSimplificationPass());
    PM.add(llvm::createReassociatePass());
    PM.add(llvm::createLoopRotatePass());
    PM.add(llvm::createInstructionCombiningPass());

We can see that some of them are also called in common cases (lib/Transforms/IPO/PassManagerBuilder.cpp:181):

  MPM.add(createEarlyCSEPass());
  MPM.add(createJumpThreadingPass());
  MPM.add(createCorrelatedValuePropagationPass());
  MPM.add(createCFGSimplificationPass());     // also called in Polly
  MPM.add(createInstructionCombiningPass());  // also called in Polly
  MPM.add(createTailCallEliminationPass());   // also called in Polly
  MPM.add(createCFGSimplificationPass());     // also called in Polly
  MPM.add(createReassociatePass());           // also called in Polly
  MPM.add(createLoopRotatePass());            // also called in Polly
  MPM.add(createLICMPass());
  MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3));
  
The initial idea is to move Polly immediately after LoopRotatePass, which will maximum the reuse of canonicalization passes and we need only keep one canonicalization pass "createPromoteMemoryToRegisterPass" in Polly. Unfortunately, it would lead to errors as shown in previous analysis (see bug http://llvm.org/bugs/show_bug.cgi?id=17323). We need to run "createCFGSimplificationPass" to ensure the correct code.

The second possible solution is to move Polly immediately before "createCFGSimplificationPass", and then remove all Polly canonicalization passes except the "createTailCallEliminationPass" and "createCFGSimplificationPass"! Unfortunately, it would still trigger an assert error when compiling some benchmarks in LLVM test-suite (see http://llvm.org/bugs/show_bug.cgi?id=17351).

Similarly I also investigated other points to move Polly, such as before "MPM.add(createTailCallEliminationPass())" or before "MPM.add(createCorrelatedValuePropagationPass())", but all these cases would trigger some unexpected compile error or execution error. I think we need more efforts to investigate why Polly cannot be placed in these points.

At last, I also investigated the basic solution that move Polly immediately before "CallGraph SCC passes" which is a very early stage in module level optimizations. However, there are still some pre-executed basic passes (DeadArgElimination/InstructionCombining/CFGSimplification),  which may reduce Polly canonicalization passes . Preliminary testing shows no extra error happens in LLVM test-suite and detailed evaluation is running. Hope we can get detailed results after several hours's running.

Best,
Star Tan

At 2013-09-22 13:02:38,"Star Tan" <tanmx_star at yeah.net> wrote: Hi Tobias,At 2013-09-19 22:59:25,"Tobias Grosser" <tobias at grosser.es> wrote:>On 09/19/2013 04:46 PM, Star Tan wrote:
>> Hi Tobias,
>>
>>
>> I am trying to move Polly later.
>>
>>
>> LLVM provides some predefined ExtensionPointTy:
>>      EP_EarlyAsPossible,
>>      EP_ModuleOptimizerEarly,
>>      EP_LoopOptimizerEnd,
>>      EP_ScalarOptimizerLate,
>>      ...
>>
>>
>> Currently Polly uses "EP_EarlyAsPossible" to run as early as possible.  As what you suggested:
>>> Instead of removing canonicalization passes, I believe we may want to
>>> move Polly to a later place in the pass manager. Possibly at the
>>> beginning of the loop optimizer right before PM.add(createLoopRotatePass());
>> I want to move it to the point immediate after someone Loop optimization pass, e.g. MPM.add(createLoopRotatePass()).  However no predefined ExtensionPointTy is available for this purpose. Instead, the "EP_ModuleOptimizerEarly" would move Polly before all loop optimization passes.
>>
>>
>> In my option, there are two solutions: one is to use "EP_ModuleOptimizerEarly" (only modify the tool/polly/lib/RegisterPasses.cpp) to move Polly before all loop optimization passes; the other is to add a new ExtensionPointTy, e.g. "EP_LoopRotateEnd" and move Polly exactly immediate after the "LoopRotate" pass (need to modify tool/polly/lib/RegisterPasses.cpp, include/llvm/Transforms/IPO/PassManagerBuilder.h and lib/Transforms/IPO/PassManagerBuilder.cpp). We can use the second way to investigate other points to start Polly.
>>
>> Is my understanding correct? Do you have any further suggestion?
>
>Yes, fully correct. I would play with solution two. Why would you add 
>Polly after the loop rotate pass?
>
>I would possibly add it at the beginning of the loop optimizer (right 
>before the loop rotation pass) possibly experimenting with a new 
>extension point called EP_LoopOptimizerBegin.
>
>Cheers,
>Tobias
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130925/ed6c476d/attachment.html>


More information about the llvm-dev mailing list