[LLVMdev] SimplifyCFG vs loops
j-humphreys at ti.com
Tue Nov 20 14:35:16 PST 2012
What was the resolution to this issue? Was the patch accepted?
I am also running into an issue with this. Basically, I have a fairly simple loop with control flow, and the loopSimplify pass is turning it into a nested loop in order to remove multiple backedges. This is making the loop more complicated, not simple, and is causing our backend to miss performance opportunities.
In particular, I have the following loop:
Normally I would expect to rotate the loop termination test to the end of the loop, and then our backend would if-convert blocks A and B and then effectively software pipeline the entire loop.
However, because the loop has two backedges (from A and from B), the LoopSimplify pass will create a nested loop, which I awkwardly write in C as:
Now, the original loop shape is obscured and the backend no longer sees this as an if-then-else if-conversion opportunity, and software pipelining does not occur.
I have not looked at the patch, but I'm also afraid that the patch may not handle this case, as the other examples I've seen in the discussion only have if-then's, not if-then-else's. Will the patch work on the example above?
If not, this issue may simply be an optimization ordering dilemma in balancing micro optimizations (presumably the reason block's A and B don't branch to a common block that then forms a single backedge is to save a branch instruction) vs clear loop shapes that enable optimizations downstream.
Couple of thoughts:
- how firm is the constraint that all loops must have a single backedge? I don't know how fundamental this assumption is made in LLVM code. This really causes some common loops to have a funky shape.
- can we delay whatever transformation is removing the common successor block of A and B, possibly until the backend after which we've had a chance to perform other important optimizations
- is there a tell-tale sign with the loop structure that would let us predict when to form nested loops vs adding a common loop latch? Or should we always add a common loop latch, and then like above, assume that the backend cleans up any unneeded branches.
- I could add a pass that would 'undo' the nested loop. On principle, I don't like doing that type of thing, because it's really just dodging the problem and would be problematic to determine when to undo.
I appreciate your thoughts.
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Andrew Clinton
Sent: Thursday, October 18, 2012 1:44 PM
To: David Blaikie
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] SimplifyCFG vs loops
On 10/18/2012 01:26 PM, David Blaikie wrote:
> On Thu, Oct 18, 2012 at 10:12 AM, Andrew Clinton<andrew at sidefx.com> wrote:
>> I actually submitted a patch to fix this very problem quite a while
>> back, but I don't think it was ever added to the baseline:
>> I have also explained why it is important to avoid the creation of
>> new loops in an optimizer that performs analysis on the looping structure.
>> I now need to keep this patch updated whenever we upgrade LLVM, to
>> ensure that our optimizer (which is now in production) works
>> correctly. Would it be possible to add this patch to the head? I
>> have a version that works with
>> 3.0 but have not yet updated to 3.1 or the head.
> I'm not quite up for diving all the way into the previous thread, but
> do you have a test case that demonstrates a missed optimization due to
> this transformational problem? That should hopefully be enough to
> convince people the change is valuable. (I assume this is causing you
> real problems/missed optimizations or you wouldn't bother keeping it
> for your out of tree releases)
I think part of the difficulty in explaining the benefit of this change is that the optimization pass where this change is of value to us is also a custom analysis pass. Basically, we're using LLVM as an optimization framework for a SIMD language, and the difficulty with the unwanted loop transformation comes when we have code like this:
If this gets transformed to something like this (which is similar to the example I posted in the earlier thread):
while () // outer loop
while () // inner loop
If you think of "uniform_and_expensive_condition_with_side_effects()" as a function that is efficient only when it is run in lockstep for the SIMD array, and "cheap_varying_condition()" as a function that can be run efficiently with conditional masking, the transformation is detrimental for performance because "uniform_and_expensive_condition_with_side_effects()" now needs to both work with conditional masking and have an implementation that does this efficiently. In practice, we have some functions that don't work at all unless they are executed "loop uniformly" as in the source loop, so changing the CFG in this way breaks the optimized code.
I know this explanation may be a little difficult to follow, especially if you are thinking of LLVM entirely in the context of optimizing serial programs for scalar architectures. So I have offered these other reasons to add the patch:
1) Conditional logic is naturally simpler than loops, so a pass designed to simplify the loop structure should not introduce more complexity in the form of additional loops when the CFG is just representing control flow
2) The patch doesn't degrade the assembly, at least in the most recent test that I performed for Cameron I think it was exactly the same instruction count or slightly better
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
More information about the llvm-dev