[llvm-dev] Making loop guards part of canonical loop structure

Tue May 28 14:02:28 PDT 2019

On 5/28/19 8:46 AM, Kit Barton via llvm-dev wrote:
> Hi all,
>
> TL;DR: Should the loop guard branch be part of the canonical loop structure?
>
> Background:
> -----------
> While working on the next set of improvements to loop fusion, we discovered that
> loop rotation will put a guard around a loop if it cannot prove the loop will
> execute at least once, as seen in the following (simplified) example:
>
> entry:
>    br i1 %cmp1, label %for.body.lr.ph, label %for.cond.cleanup
>
> for.body.lr.ph:                                   ; preds = %entry
>    br label %for.body
>
> for.body:                                         ; preds = %for.body.lr.ph, %for.inc
>    br label %for.inc
>
> for.inc:                                          ; preds = %for.body
>    br i1 %cmp, label %for.body, label %for.cond.for.cond.cleanup_crit_edge
>
> for.cond.for.cond.cleanup_crit_edge:              ; preds = %for.inc
>    br label %for.cond.cleanup
>
> for.cond.cleanup:                                 ; preds = %for.cond.for.cond.cleanup_crit_edge, %entry
>    br label %for.end
>
> for.end:                                          ; preds = %for.cond.cleanup
>    ret void
>
> However, for loops that can be proven to execute at least once, there is no
> guard.
>
> This creates complications for several loop optimizations because they need to
> handle both guarded loops and non-guarded loops. For example, the current loop
> fusion pass needs to check whether two loops are control flow equivalent before
> fusing them (i.e., if the first loop executes, the second loop is guaranteed to
> execute). This is currently done by checking that the preheader of the first
> loop dominates the preheader of the second loop, and the preheader of the second
> loop post-dominates the preheader of the first loop. When one (or both) of the
> loops have a guard, then this check no longer works. If the loop guard was part
> of the canonical form, then this check could be done using the loop guard,
> instead of the preheader.
>
> Similarly, when checking for perfect loop nests, a loop guard around the inner
> loop will make the nest appear imperfect. However, if the guard is considered to
> be part of the inner loop, it is easier to prove a perfect nest.
>
> Both of these examples have alternative ways to deal with the loop guards, if
> they are present. However, I would like to have the discussion regarding making
> guards part of the canonical loop structure. If people are supportive of this,
> we can start doing the work to implement the guards and start modify loop
> optimizations to deal with them appropriately. However, if this is not the
> direction we want to go, we can deal with the guards in the specific
> optimizations where they are relevant.
>
> Proposed Solution:
> ------------------
> Ensure that all loops in canonical form have a guard branch that dominates the
> preheader. If such a guard branch does not exist, a trivial guard will be
> created.
>
> I have not looked at the implementation of this
> extensively yet because I wanted feedback on this direction first. However, my
> initial thought is to modify LoopSimplify to add the concept of a loop guard,
> and provide similar guarantees that it currently provides for preheaders and exit
> blocks. Specifically, if a guard block is not found, one is created and inserted
> immediately before the loop preheader to guarantee a structure similar to the
> one in the example above. Note that as with the preheader and exit blocks, it is
> possible that subsequent passes (e.g., SimplifyCFG) may remove these blocks thus
> taking it out of canonical form. However, this is an existing situation that
> loop optimizations already deal with.

Hi, Kit,

Maybe I'm missing something obvious, but how does this help? I 
understand how guards can make checking for perfect nesting, etc. 
harder, but does always having guards make this easier?

Also, we already have a situation with LCSSA form where trivial 
constructs are introduced, and then eliminated by local simplifications, 
sometimes multiple times as the pipeline progresses. Having more of them 
seems undesirable - it leads to more IR changes, which causes more 
analysis invalidation, and so on.

Thanks again,

Hal

>
>
>
> Any comments/feedback are greatly appreciated.
>
> Thanks,
>
> Kit
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory