[PATCH] D19049: [BlockFrequencyInfo] Handling nested irreducible CFG with geometric series and top-down prorogation

Thu Apr 21 08:15:46 PDT 2016

On Wed, Apr 20, 2016 at 3:27 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2016-Apr-12, at 21:23, Yuanfang Chen <cheny at udel.edu> wrote:
> >
> > kula created this revision.
> > kula added reviewers: dexonsmith, dnovillo.
> > kula added a subscriber: llvm-commits.
> >
> > This patch implements the TODOs in BlockFrequencyInfoImpl.h.
>
> Great!  Thanks for working on this.
>
> Do you have a concrete testcase where this is valuable?
>

I don't have a real world test cases for this. New algorithm has better
precision for cases that old algorithm can handle.  New algorithm handle
nested scc-loop with complexity depending on the specific CFG. Simple CFG
does not incur high complexity. Old algorithm gives wrong number for nested
scc-loop cases.

The @nonentry_header case in irreducible.ll shows the difference between
old and new algorithm. Bottom block should be cold but old algorithm think
it is hot because it ignores nested scc-loop.

>
> > 1. handle nested irreducible loop.
> > 2. true weight distribution among multi-heads of irreducible loop,
> instead of even split + adjustment by backedge later.
> > 3. Use geometric series instead of loop scale for mass iteration. loop
> scale are still used, but for the purpose of scaling down local mass below
> 1.0.
>
> The patch is pretty huge.  Is it possible to stage these changes somehow
> to make it easier to review?
>
> It's going to take some time for me to page this algorithm back in.  The
> smaller each change is, the easier it'll be for me ;).
>
> I haven't looked in detail at the patch yet, but just glancing I saw a few
> smaller NFC changes (like DEBUG output and variable names) that would
> should definitely be split out.

Done.

>
> > 4. normal code path for reducible loop is not affected.
> >
> > This patch passes all regression test. Several test cases are changed
> accordingly. I've changed the @nonentry_header case in irreducible.ll to use
> > huge weight on the switch instruction to show that 'bottom' block should
> not be hot.
> >
> > Method:
> > 1. for loop a, found all SCCs, create SCC loops and adjust nodes in its
> parent node list accordingly.
> > 2. In topological order, propagate mass on all SCCs where non-trivial
> SCCs incur resursion.
> > 3. compute start term of geometric series.
> > 4. for each header, compute its ratio of geometric series by propogation
> full mass starting from itself and other block hass empty mass. The
> cumulated backege mass is ratio.
> > 5. find the max ratio among headers.
> > 6. compute local scaled down mass for all headers with geometric series
> equation [a/(1-r)]. assume n is infinity. I've add a TODO in file to see if
> we should use [a*(1-r^n)/(1-r)].
> > 7. Propagate with header mass to other blocks in loop.
>
> What's the worst-case complexity?  How does it compare to the old
> algorithm?
>

N: number of blocks in top level irreducible CFG
A: number of header in irreducible CFG regardless of scc-loop level.

The major time difference between old and new algorithm is contributed by
step 4 (compute geometric series ratio), where for each header, propagate
on all blocks of that specific loop.
Worst-case quadratic when A==N.  Generally O(AN). Common case, A=2. Old
algorithm is O(N).

> >
> > http://reviews.llvm.org/D19049
> >
> > Files:
> >  include/llvm/Analysis/BlockFrequencyInfoImpl.h
> >  lib/Analysis/BlockFrequencyInfoImpl.cpp
> >  test/Analysis/BlockFrequencyInfo/irreducible.ll
> >
> > <D19049.53516.patch>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160421/7dd1f3e5/attachment-0001.html>