[PATCH] D19049: [BlockFrequencyInfo] Handling nested irreducible CFG with geometric series and top-down prorogation

Mon May 9 10:44:41 PDT 2016

Hi cheny@,

Previously you wrote (in response to Duncan I think)

*  The @nonentry_header case in irreducible.ll shows the difference between
old and new algorithm. Bottom block should be cold but old algorithm think
it is hot because it ignores nested scc-loop.*

I looked over the frequencies being computed for this function and I'm not
totally clear on why "bottom" should be considered cold and a node like
"right" should be considered hot. Both are involved in a loop of some sort,
and at least on the surface it seems that whether the loop is considered
innermost is really just an artifact of how the loop SCCs are constructed.

I dumped out DOT graphs for new and old implementations for this example
(attached), and it seems that there are still some major "flow insanities"
in both cases. Here is old:

and here is new:

Hard to see from a high level why this is an improvement.

Thanks, Than

On Thu, Apr 21, 2016 at 11:15 AM, Yuanfang Chen via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

>
>
> On Wed, Apr 20, 2016 at 3:27 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
>
>>
>> > On 2016-Apr-12, at 21:23, Yuanfang Chen <cheny at udel.edu> wrote:
>> >
>> > kula created this revision.
>> > kula added reviewers: dexonsmith, dnovillo.
>> > kula added a subscriber: llvm-commits.
>> >
>> > This patch implements the TODOs in BlockFrequencyInfoImpl.h.
>>
>> Great!  Thanks for working on this.
>>
>> Do you have a concrete testcase where this is valuable?
>>
>
> I don't have a real world test cases for this. New algorithm has better
> precision for cases that old algorithm can handle.  New algorithm handle
> nested scc-loop with complexity depending on the specific CFG. Simple CFG
> does not incur high complexity. Old algorithm gives wrong number for nested
> scc-loop cases.
>
> The @nonentry_header case in irreducible.ll shows the difference between
> old and new algorithm. Bottom block should be cold but old algorithm
> think it is hot because it ignores nested scc-loop.
>
>
>>
>> > 1. handle nested irreducible loop.
>> > 2. true weight distribution among multi-heads of irreducible loop,
>> instead of even split + adjustment by backedge later.
>> > 3. Use geometric series instead of loop scale for mass iteration. loop
>> scale are still used, but for the purpose of scaling down local mass below
>> 1.0.
>>
>> The patch is pretty huge.  Is it possible to stage these changes somehow
>> to make it easier to review?
>>
>> It's going to take some time for me to page this algorithm back in.  The
>> smaller each change is, the easier it'll be for me ;).
>>
>> I haven't looked in detail at the patch yet, but just glancing I saw a
>> few smaller NFC changes (like DEBUG output and variable names) that would
>> should definitely be split out.
>
>
> Done.
>
>
>>
>> > 4. normal code path for reducible loop is not affected.
>> >
>> > This patch passes all regression test. Several test cases are changed
>> accordingly. I've changed the @nonentry_header case in irreducible.ll to use
>> > huge weight on the switch instruction to show that 'bottom' block
>> should not be hot.
>> >
>> > Method:
>> > 1. for loop a, found all SCCs, create SCC loops and adjust nodes in its
>> parent node list accordingly.
>> > 2. In topological order, propagate mass on all SCCs where non-trivial
>> SCCs incur resursion.
>> > 3. compute start term of geometric series.
>> > 4. for each header, compute its ratio of geometric series by
>> propogation full mass starting from itself and other block hass empty mass.
>> The cumulated backege mass is ratio.
>> > 5. find the max ratio among headers.
>> > 6. compute local scaled down mass for all headers with geometric series
>> equation [a/(1-r)]. assume n is infinity. I've add a TODO in file to see if
>> we should use [a*(1-r^n)/(1-r)].
>> > 7. Propagate with header mass to other blocks in loop.
>>
>> What's the worst-case complexity?  How does it compare to the old
>> algorithm?
>>
>
> N: number of blocks in top level irreducible CFG
> A: number of header in irreducible CFG regardless of scc-loop level.
>
> The major time difference between old and new algorithm is contributed by
> step 4 (compute geometric series ratio), where for each header, propagate
> on all blocks of that specific loop.
> Worst-case quadratic when A==N.  Generally O(AN). Common case, A=2. Old
> algorithm is O(N).
>
>
>
>> >
>> > http://reviews.llvm.org/D19049
>> >
>> > Files:
>> >  include/llvm/Analysis/BlockFrequencyInfoImpl.h
>> >  lib/Analysis/BlockFrequencyInfoImpl.cpp
>> >  test/Analysis/BlockFrequencyInfo/irreducible.ll
>> >
>> > <D19049.53516.patch>
>>
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160509/6b7217af/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: base.nonentry_header.png
Type: image/png
Size: 57434 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160509/6b7217af/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new.nonentry_header.png
Type: image/png
Size: 51020 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160509/6b7217af/attachment-0003.png>