[PATCH] blockfreq: Rewrite block frequency analysis

Thu Mar 13 23:04:20 PDT 2014

*ping*

If someone is planning to take a look pre-commit, let me know.
Otherwise, I’ll commit away on Monday (and you can post-commit
review).

On 2014 Mar 7, at 15:44, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:

> This patch series rewrites the shared implementation of BlockFrequencyInfo
> and MachineBlockFrequencyInfo entirely.
> 
>  - Patches 0001 and 0002 are simple cleanups.
> 
>  - Patch 0003 is a class rename (and file move).  Let me know if I should
>    just merge it with 0006.
> 
>  - Patches 0004 and 0005 introduce some supporting classes.
> 
>  - Patch 0006 rewrites BlockFrequencyInfoImpl entirely.
> 
>  - Patch 0007 is *not* included (it’s headed separately to llvmdev).
> 
> The old implementation had a fundamental flaw:  precision losses from
> nested loops (and very wide branches) compounded past loop exits (and
> convergence points).
> 
> The @nested_loops testcase at the end of
> test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
> function has three nested loops, with branch weights in the loop headers
> of 1:4000 (exit:continue).  The old analysis gives nonsense:
> 
>    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
>    ---- Block Freqs ----
>     entry = 1.0
>     for.cond1.preheader = 1.00103
>     for.cond4.preheader = 5.5222
>     for.body6 = 18095.19995
>     for.inc8 = 4.52264
>     for.inc11 = 0.00109
>     for.end13 = 0.0
> 
> The new analysis gives correct results:
> 
>    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
>    block-frequency-info: nested_loops
>     - entry: float = 1.0, int = 8
>     - for.cond1.preheader: float = 4001.0, int = 32007
>     - for.cond4.preheader: float = 16008001.0, int = 128064007
>     - for.body6: float = 64048012001.0, int = 512384096007
>     - for.inc8: float = 16008001.0, int = 128064007
>     - for.inc11: float = 4001.0, int = 32007
>     - for.end13: float = 1.0, int = 8
> 
> The new algorithm leverages BlockMass and PositiveFloat to maintain
> precision, separates "probability mass distribution" from "loop
> scaling", and uses dithering to eliminate probability mass loss.
> 
> Additionally, the new algorithm has a complexity advantage over the old.
> The previous algorithm was quadratic in the worst case.  The new
> algorithm is still worst-case quadratic in the presence of irreducible
> control flow, but it's linear otherwise.
> 
> The key difference between the old algorithm and the new is that control
> flow within a loop is evaluated separately from control flow outside,
> limiting propagation of precision problems and allowing loop scale to be
> calculated independently of mass distribution.  Loops are visited
> bottom-up, their loop scales are calculated, and they are replaced by
> pseudo-nodes.  Mass is then distributed through the function, which is
> now a DAG.  Finally, loops are revisited top-down to multiply through
> the loop scales and the masses distributed to pseudo nodes.
> 
> There are some remaining flaws.
> 
>  - Irreducible control flow isn't ignored, but it also isn't modelled
>    correctly.  Nevertheless, the new algorithm should behave better
>    than the old algorithm (at least it evaluates irreducible edges),
>    but mileage may vary.
> 
>  - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
>    the 64-bit integer precision used downstream.  If downstream users
>    of BlockFrequencyInfo are updated to use PositiveFloat (instead of
>    BlockFrequency), this limitation can be removed.  It's not clear if
>    that's practical.
> 
>  - BlockFrequencyInfo does *not* leverage LoopInfo/MachineLoopInfo and
>    Loop/MachineLoop.  These are currently unsuitable because they use
>    quadratic storage and don't have the API needed to make this
>    algorithm efficient.  I looked into updating them, but downstream
>    users rely on the current API.
> 
>  - The "bias" calculation proposed recently on llvmdev is *not*
>    incorporated here.  A separate patch (0007) takes a stab at that;
>    I’ll post it to llvmdev soon.
> 
> I ran through the LNT benchmarks; there was some movement in both
> directions.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-blockfreq-Use-const-in-MachineBlockFrequencyInfo.patch
Type: application/octet-stream
Size: 5502 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-blockfreq-Implement-Pass-releaseMemory.patch
Type: application/octet-stream
Size: 7880 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-blockfreq-Rename-class-to-BlockFrequencyInfoImpl.patch
Type: application/octet-stream
Size: 29296 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-blockfreq-Add-PositiveFloat-class.patch
Type: application/octet-stream
Size: 62887 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-blockfreq-Add-BlockMass.patch
Type: application/octet-stream
Size: 17245 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-blockfreq-Rewrite-BlockFrequencyInfoImpl.patch
Type: application/octet-stream
Size: 84868 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0005.obj>