[LLVMdev] Separating loop nests based on profile information?
Xinliang David Li
xinliangli at gmail.com
Mon Jan 12 19:39:49 PST 2015
On Wed, Jan 7, 2015 at 5:33 PM, Chandler Carruth <chandlerc at google.com>
wrote:
>
> On Wed, Jan 7, 2015 at 5:19 PM, Philip Reames <listmail at philipreames.com>
> wrote:
>
>> I've been playing with approaches to getting better optimization of loops
>> which contain infrequently executed slow paths. I've gotten as far as
>> throwing together a proof of concept implementation of a profile guided
>> optimization to separate a single loop with multiple latches into a loop
>> nest, but I want to get feedback from interested parties before investing
>> much more effort.
>>
>> The primary case I'm concerned about looks something like this C example:
>> while( some_condition )
>> //do lots of work here
>> if (!some_rare_condition) { <-- This is loop variant
>> helper();
>> }
>> }
>>
>> The key point here is that the 'do lots of work' section contains things
>> we could lift out of the loop, but our knowledge of memory gets completely
>> destoryed by the 'helper' call which doesn't get inlined. This ends up
>> crippling LICM in particular, and GVN to a lessor degree.
>>
>> The approach I've been playing with would create this loop structure:
>> goto inner
>> while (true) {
>> outer:
>> helper();
>> inner:
>> while( some_condition )
>> //do lots of work here
>> if (!some_rare_condition) { <-- This is loop variant
>> goto outer;
>> }
>> }
>> }
>>
>> (Excuse the psuedo-C. Hopefully the idea is clear.)
>>
>
> Yep.
>
> I've not thought about this a lot, but I have two initial questions that
> maybe you can answer:
>
> How does this compare with classical approaches of loop peeling,
> partitioning, fission, or whatever you might call it? Is there any
> literature behind this approach or some literature it should be compared
> with? (I genuinely don't know this area that well, so I'm of little help
> here...)
>
It is not any of those loop transformations. It does not even change the
control flow. It is more a code layout optimization -- move the cold trace
in the loop out of the body.
>
>
> Some of your points I have quick feedback on:
>
>> Points for discussion:
>>
>> - Is using profile information for this purpose even a reasonable
>> thing to do?
>>
>> Yes!
>
>
>>
>> - I chose to implement this without relying on the existing block
>> frequency analysis. My reasoning was that a) this is a rarely taken case
>> and adding an expensive analysis dependency probably wasn't worthwhile and
>> b) that block frequency analysis was more expensive/precise than I really
>> needed. Is this reasonable?
>>
>> I think we should always use the analyses. Either BlockFrequency or
> BranchProbability. I think probably both in the common joint usage
> (frequency of the loop header combined with probability of the cold region).
>
yes.
>
>> - If so, is the notion of 'rareness' of a loop block something that's
>> worth extracting out on it's own and reusing? Are there other similar uses
>> anyone can think of?
>> - Currently, I'm only supporting a fairly small set of controlling
>> conditions. Are there important cases I'm not considering?
>>
>> To both of these, I think the general combination to use is to identify
> the set of blocks dominated by a block which is in the loop body of a hot
> loop, and is cold relative to the other successors of its predecessor(s).
> These form cold "regions" as I think of them without requiring the
> complexity of the region analysis.
>
>
Static prediction should handle it -- call heuristics or some combination
with other heuristics (error handler recognition etc).
David
>
>> - Since the rarest latch is often deep in a loop - with other "if (X)
>> continue;" (i.e. latches) before it - this tends to create loops with
>> multiple exiting blocks. Some of the existing passes might not deal with
>> this well, is that a major concern? Suggestions for how to analysis and
>> validate?
>>
>> I'm somewhat concerned about this, but want to think more about the
> fundamental transformation.
>
>
>>
>> - Currently, I've structured this as pulling off the rarest latch as
>> an outer iteration. I could also pull off the most popular latch as an
>> inner iteration. This might give different tradeoffs; thoughts?
>>
>> Generally, any thoughts anyone have on the problem or approach are
>> welcome. I'm not particular attached to the particular approach laid out
>> here and if there's a more advantageous approach, all the better.
>>
> Thanks for pushing on this! ;] Now I need to go and ponder a lot so i can
> reply more deeply on the actual transform.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/69bd8bbb/attachment.html>
More information about the llvm-dev
mailing list