[PATCH] blockfreq: Rewrite block frequency analysis

Duncan P. N. Exon Smith dexonsmith at apple.com
Fri Mar 7 15:44:21 PST 2014

This patch series rewrites the shared implementation of BlockFrequencyInfo
and MachineBlockFrequencyInfo entirely.

  - Patches 0001 and 0002 are simple cleanups.

  - Patch 0003 is a class rename (and file move).  Let me know if I should
    just merge it with 0006.

  - Patches 0004 and 0005 introduce some supporting classes.

  - Patch 0006 rewrites BlockFrequencyInfoImpl entirely.

  - Patch 0007 is *not* included (it’s headed separately to llvmdev).

The old implementation had a fundamental flaw:  precision losses from
nested loops (and very wide branches) compounded past loop exits (and
convergence points).

The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue).  The old analysis gives nonsense:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    ---- Block Freqs ----
     entry = 1.0
     for.cond1.preheader = 1.00103
     for.cond4.preheader = 5.5222
     for.body6 = 18095.19995
     for.inc8 = 4.52264
     for.inc11 = 0.00109
     for.end13 = 0.0

The new analysis gives correct results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    block-frequency-info: nested_loops
     - entry: float = 1.0, int = 8
     - for.cond1.preheader: float = 4001.0, int = 32007
     - for.cond4.preheader: float = 16008001.0, int = 128064007
     - for.body6: float = 64048012001.0, int = 512384096007
     - for.inc8: float = 16008001.0, int = 128064007
     - for.inc11: float = 4001.0, int = 32007
     - for.end13: float = 1.0, int = 8

The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.

Additionally, the new algorithm has a complexity advantage over the old.
The previous algorithm was quadratic in the worst case.  The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear otherwise.

The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution.  Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes.  Mass is then distributed through the function, which is
now a DAG.  Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.

There are some remaining flaws.

  - Irreducible control flow isn't ignored, but it also isn't modelled
    correctly.  Nevertheless, the new algorithm should behave better
    than the old algorithm (at least it evaluates irreducible edges),
    but mileage may vary.

  - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
    the 64-bit integer precision used downstream.  If downstream users
    of BlockFrequencyInfo are updated to use PositiveFloat (instead of
    BlockFrequency), this limitation can be removed.  It's not clear if
    that's practical.

  - BlockFrequencyInfo does *not* leverage LoopInfo/MachineLoopInfo and
    Loop/MachineLoop.  These are currently unsuitable because they use
    quadratic storage and don't have the API needed to make this
    algorithm efficient.  I looked into updating them, but downstream
    users rely on the current API.

  - The "bias" calculation proposed recently on llvmdev is *not*
    incorporated here.  A separate patch (0007) takes a stab at that;
    I’ll post it to llvmdev soon.

I ran through the LNT benchmarks; there was some movement in both

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-blockfreq-Use-const-in-MachineBlockFrequencyInfo.patch
Type: application/octet-stream
Size: 5502 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-blockfreq-Implement-Pass-releaseMemory.patch
Type: application/octet-stream
Size: 7880 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-blockfreq-Rename-class-to-BlockFrequencyInfoImpl.patch
Type: application/octet-stream
Size: 29296 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-blockfreq-Add-PositiveFloat-class.patch
Type: application/octet-stream
Size: 62887 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-blockfreq-Add-BlockMass.patch
Type: application/octet-stream
Size: 17245 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-blockfreq-Rewrite-BlockFrequencyInfoImpl.patch
Type: application/octet-stream
Size: 84868 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0005.obj>

More information about the llvm-commits mailing list