[llvm-dev] Making loop guards part of canonical loop structure
Krzysztof Parzyszek via llvm-dev
llvm-dev at lists.llvm.org
Thu May 30 11:28:08 PDT 2019
On Hexagon, unguarded loops cannot be converted to hardware loops.
If the loop's latch branch alone handles the iteration count, including the possibility of 0, then the loop cannot be converted to a hardware loop, because hardware loops must iterate at least once. If the entire loop is guarded against zero iteration count, we can put the loop setup in the preheader, since at that point the loop is guaranteed to execute at least once.
I am strongly in favor of having some way to create loop guards, even if they are trivial.
Krzysztof Parzyszek kparzysz at quicinc.com LLVM compiler development
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Finkel, Hal J. via llvm-dev
Sent: Tuesday, May 28, 2019 4:02 PM
To: Kit Barton <kit.barton at gmail.com>; LLVM Dev Mailing List <llvm-dev at lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] Making loop guards part of canonical loop structure
On 5/28/19 8:46 AM, Kit Barton via llvm-dev wrote:
> Hi all,
> TL;DR: Should the loop guard branch be part of the canonical loop structure?
> While working on the next set of improvements to loop fusion, we
> discovered that loop rotation will put a guard around a loop if it
> cannot prove the loop will execute at least once, as seen in the following (simplified) example:
> br i1 %cmp1, label %for.body.lr.ph, label %for.cond.cleanup
> for.body.lr.ph: ; preds = %entry
> br label %for.body
> for.body: ; preds = %for.body.lr.ph, %for.inc
> br label %for.inc
> for.inc: ; preds = %for.body
> br i1 %cmp, label %for.body, label
> for.cond.for.cond.cleanup_crit_edge: ; preds = %for.inc
> br label %for.cond.cleanup
> for.cond.cleanup: ; preds = %for.cond.for.cond.cleanup_crit_edge, %entry
> br label %for.end
> for.end: ; preds = %for.cond.cleanup
> ret void
> However, for loops that can be proven to execute at least once, there
> is no guard.
> This creates complications for several loop optimizations because they
> need to handle both guarded loops and non-guarded loops. For example,
> the current loop fusion pass needs to check whether two loops are
> control flow equivalent before fusing them (i.e., if the first loop
> executes, the second loop is guaranteed to execute). This is currently
> done by checking that the preheader of the first loop dominates the
> preheader of the second loop, and the preheader of the second loop
> post-dominates the preheader of the first loop. When one (or both) of
> the loops have a guard, then this check no longer works. If the loop
> guard was part of the canonical form, then this check could be done using the loop guard, instead of the preheader.
> Similarly, when checking for perfect loop nests, a loop guard around
> the inner loop will make the nest appear imperfect. However, if the
> guard is considered to be part of the inner loop, it is easier to prove a perfect nest.
> Both of these examples have alternative ways to deal with the loop
> guards, if they are present. However, I would like to have the
> discussion regarding making guards part of the canonical loop
> structure. If people are supportive of this, we can start doing the
> work to implement the guards and start modify loop optimizations to
> deal with them appropriately. However, if this is not the direction we
> want to go, we can deal with the guards in the specific optimizations where they are relevant.
> Proposed Solution:
> Ensure that all loops in canonical form have a guard branch that
> dominates the preheader. If such a guard branch does not exist, a
> trivial guard will be created.
> I have not looked at the implementation of this extensively yet
> because I wanted feedback on this direction first. However, my initial
> thought is to modify LoopSimplify to add the concept of a loop guard,
> and provide similar guarantees that it currently provides for
> preheaders and exit blocks. Specifically, if a guard block is not
> found, one is created and inserted immediately before the loop
> preheader to guarantee a structure similar to the one in the example
> above. Note that as with the preheader and exit blocks, it is possible
> that subsequent passes (e.g., SimplifyCFG) may remove these blocks
> thus taking it out of canonical form. However, this is an existing situation that loop optimizations already deal with.
Maybe I'm missing something obvious, but how does this help? I understand how guards can make checking for perfect nesting, etc.
harder, but does always having guards make this easier?
Also, we already have a situation with LCSSA form where trivial constructs are introduced, and then eliminated by local simplifications, sometimes multiple times as the pipeline progresses. Having more of them seems undesirable - it leads to more IR changes, which causes more analysis invalidation, and so on.
> Any comments/feedback are greatly appreciated.
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
LLVM Developers mailing list
llvm-dev at lists.llvm.org
More information about the llvm-dev