[LLVMdev] SimplifyCFG vs loops

Thu Oct 18 11:43:32 PDT 2012

On 10/18/2012 01:26 PM, David Blaikie wrote:
> On Thu, Oct 18, 2012 at 10:12 AM, Andrew Clinton<andrew at sidefx.com>  wrote:
>> I actually submitted a patch to fix this very problem quite a while back,
>> but I don't think it was ever added to the baseline:
>>
>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110711/124136.html
>>
>> I have also explained why it is important to avoid the creation of new loops
>> in an optimizer that performs analysis on the looping structure.
>>
>> I now need to keep this patch updated whenever we upgrade LLVM, to ensure
>> that our optimizer (which is now in production) works correctly.  Would it
>> be possible to add this patch to the head?  I have a version that works with
>> 3.0 but have not yet updated to 3.1 or the head.
> I'm not quite up for diving all the way into the previous thread, but
> do you have a test case that demonstrates a missed optimization due to
> this transformational problem? That should hopefully be enough to
> convince people the change is valuable. (I assume this is causing you
> real problems/missed optimizations or you wouldn't bother keeping it
> for your out of tree releases)
>

I think part of the difficulty in explaining the benefit of this change 
is that the optimization pass where this change is of value to us is 
also a custom analysis pass.  Basically, we're using LLVM as an 
optimization framework for a SIMD language, and the difficulty with the 
unwanted loop transformation comes when we have code like this:

while (uniform_and_expensive_condition_with_side_effects())
{
     if (cheap_varying_condition())
         do_this
}

If this gets transformed to something like this (which is similar to the 
example I posted in the earlier thread):

while () // outer loop
{
      while () // inner loop
      {
          if (!uniform_and_expensive_condition_with_side_effects())
             return;
          if (cheap_varying_condition())
             break;
      }
     do_this
}

If you think of "uniform_and_expensive_condition_with_side_effects()" as 
a function that is efficient only when it is run in lockstep for the 
SIMD array, and "cheap_varying_condition()" as a function that can be 
run efficiently with conditional masking, the transformation is 
detrimental for performance because 
"uniform_and_expensive_condition_with_side_effects()" now needs to both 
work with conditional masking and have an implementation that does this 
efficiently.  In practice, we have some functions that don't work at all 
unless they are executed "loop uniformly" as in the source loop, so 
changing the CFG in this way breaks the optimized code.

I know this explanation may be a little difficult to follow, especially 
if you are thinking of LLVM entirely in the context of optimizing serial 
programs for scalar architectures.  So I have offered these other 
reasons to add the patch:

1) Conditional logic is naturally simpler than loops, so a pass designed 
to simplify the loop structure should not introduce more complexity in 
the form of additional loops when the CFG is just representing control flow
2) The patch doesn't degrade the assembly, at least in the most recent 
test that I performed for Cameron I think it was exactly the same 
instruction count or slightly better

Andrew