[LLVMdev] Separating loop nests based on profile information?

Mon Jan 12 19:39:49 PST 2015

On Wed, Jan 7, 2015 at 5:33 PM, Chandler Carruth <chandlerc at google.com>
wrote:

>
> On Wed, Jan 7, 2015 at 5:19 PM, Philip Reames <listmail at philipreames.com>
> wrote:
>
>> I've been playing with approaches to getting better optimization of loops
>> which contain infrequently executed slow paths.  I've gotten as far as
>> throwing together a proof of concept implementation of a profile guided
>> optimization to separate a single loop with multiple latches into a loop
>> nest, but I want to get feedback from interested parties before investing
>> much more effort.
>>
>> The primary case I'm concerned about looks something like this C example:
>> while( some_condition )
>>   //do lots of work here
>>   if (!some_rare_condition) { <-- This is loop variant
>>     helper();
>>   }
>> }
>>
>> The key point here is that the 'do lots of work' section contains things
>> we could lift out of the loop, but our knowledge of memory gets completely
>> destoryed by the 'helper' call which doesn't get inlined.  This ends up
>> crippling LICM in particular, and GVN to a lessor degree.
>>
>> The approach I've been playing with would create this loop structure:
>> goto inner
>> while (true) {
>>   outer:
>>   helper();
>>   inner:
>>   while( some_condition )
>>     //do lots of work here
>>     if (!some_rare_condition) { <-- This is loop variant
>>       goto outer;
>>     }
>>   }
>> }
>>
>> (Excuse the psuedo-C.  Hopefully the idea is clear.)
>>
>
> Yep.
>
> I've not thought about this a lot, but I have two initial questions that
> maybe you can answer:
>
> How does this compare with classical approaches of loop peeling,
> partitioning, fission, or whatever you might call it? Is there any
> literature behind this approach or some literature it should be compared
> with? (I genuinely don't know this area that well, so I'm of little help
> here...)
>

It is not any of those loop transformations. It does not even change the
control flow. It is more a code layout optimization -- move the cold trace
in the loop out of the body.

>
>
> Some of your points I have quick feedback on:
>
>> Points for discussion:
>>
>>    - Is using profile information for this purpose even a reasonable
>>    thing to do?
>>
>> Yes!
>
>
>>
>>    - I chose to implement this without relying on the existing block
>>    frequency analysis. My reasoning was that a) this is a rarely taken case
>>    and adding an expensive analysis dependency probably wasn't worthwhile and
>>    b) that block frequency analysis was more expensive/precise than I really
>>    needed. Is this reasonable?
>>
>> I think we should always use the analyses. Either BlockFrequency or
> BranchProbability. I think probably both in the common joint usage
> (frequency of the loop header combined with probability of the cold region).
>

yes.

>
>>    - If so, is the notion of 'rareness' of a loop block something that's
>>    worth extracting out on it's own and reusing? Are there other similar uses
>>    anyone can think of?
>>    - Currently, I'm only supporting a fairly small set of controlling
>>    conditions. Are there important cases I'm not considering?
>>
>> To both of these, I think the general combination to use is to identify
> the set of blocks dominated by a block which is in the loop body of a hot
> loop, and is cold relative to the other successors of its predecessor(s).
> These form cold "regions" as I think of them without requiring the
> complexity of the region analysis.
>
>

Static prediction should handle it  --  call heuristics or some combination
with other heuristics (error handler recognition etc).

David

>
>>    - Since the rarest latch is often deep in a loop - with other "if (X)
>>    continue;" (i.e. latches) before it - this tends to create loops with
>>    multiple exiting blocks. Some of the existing passes might not deal with
>>    this well, is that a major concern? Suggestions for how to analysis and
>>    validate?
>>
>> I'm somewhat concerned about this, but want to think more about the
> fundamental transformation.
>
>
>>
>>    - Currently, I've structured this as pulling off the rarest latch as
>>    an outer iteration. I could also pull off the most popular latch as an
>>    inner iteration. This might give different tradeoffs; thoughts?
>>
>> Generally, any thoughts anyone have on the problem or approach are
>> welcome. I'm not particular attached to the particular approach laid out
>> here and if there's a more advantageous approach, all the better.
>>
> Thanks for pushing on this! ;] Now I need to go and ponder a lot so i can
> reply more deeply on the actual transform.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/69bd8bbb/attachment.html>