[PATCH] Phabricator review: blockfreq: Rewrite block frequency analysis

Wed Mar 19 14:04:51 PDT 2014

chandlerc added you to the CC list for the revision "Phabricator review: blockfreq: Rewrite block frequency analysis".

Hi chandlerc, atrick, bob.wilson,

This is patches 0003 through 0006 squashed together from the review on
llvm-commits.  I've copied the original email content (thread title was
"[PATCH] blockfreq: Rewrite block frequency analysis") here as reference.

This patch has not been updated to reflect comments received so far.

----

This patch series rewrites the shared implementation of BlockFrequencyInfo
and MachineBlockFrequencyInfo entirely.

 - Patches 0001 and 0002 are simple cleanups.

 - Patch 0003 is a class rename (and file move).  Let me know if I should
   just merge it with 0006.

 - Patches 0004 and 0005 introduce some supporting classes.

 - Patch 0006 rewrites BlockFrequencyInfoImpl entirely.

 - Patch 0007 is *not* included (it’s headed separately to llvmdev).

The old implementation had a fundamental flaw:  precision losses from
nested loops (and very wide branches) compounded past loop exits (and
convergence points).

The `@nested_loops` testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue).  The old analysis gives nonsense:

   Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
   ---- Block Freqs ----
    entry = 1.0
    for.cond1.preheader = 1.00103
    for.cond4.preheader = 5.5222
    for.body6 = 18095.19995
    for.inc8 = 4.52264
    for.inc11 = 0.00109
    for.end13 = 0.0

The new analysis gives correct results:

   Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
   block-frequency-info: nested_loops
    - entry: float = 1.0, int = 8
    - for.cond1.preheader: float = 4001.0, int = 32007
    - for.cond4.preheader: float = 16008001.0, int = 128064007
    - for.body6: float = 64048012001.0, int = 512384096007
    - for.inc8: float = 16008001.0, int = 128064007
    - for.inc11: float = 4001.0, int = 32007
    - for.end13: float = 1.0, int = 8

The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.

Additionally, the new algorithm has a complexity advantage over the old.
The previous algorithm was quadratic in the worst case.  The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear otherwise.

The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution.  Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes.  Mass is then distributed through the function, which is
now a DAG.  Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.

There are some remaining flaws.

 - Irreducible control flow isn't ignored, but it also isn't modelled
   correctly.  Nevertheless, the new algorithm should behave better
   than the old algorithm (at least it evaluates irreducible edges),
   but mileage may vary.

 - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
   the 64-bit integer precision used downstream.  If downstream users
   of BlockFrequencyInfo are updated to use PositiveFloat (instead of
   BlockFrequency), this limitation can be removed.  It's not clear if
   that's practical.

 - BlockFrequencyInfo does *not* leverage LoopInfo/MachineLoopInfo and
   Loop/MachineLoop.  These are currently unsuitable because they use
   quadratic storage and don't have the API needed to make this
   algorithm efficient.  I looked into updating them, but downstream
   users rely on the current API.

 - The "bias" calculation proposed recently on llvmdev is *not*
   incorporated here.  A separate patch (0007) takes a stab at that;
   I’ll post it to llvmdev soon.

I ran through the LNT benchmarks; there was some movement in both
directions.

http://llvm-reviews.chandlerc.com/D3125

Files:
  include/llvm/Analysis/BlockFrequencyImpl.h
  include/llvm/Analysis/BlockFrequencyInfo.h
  include/llvm/Analysis/BlockFrequencyInfoImpl.h
  include/llvm/CodeGen/MachineBlockFrequencyInfo.h
  include/llvm/Support/BlockMass.h
  include/llvm/Support/PositiveFloat.h
  lib/Analysis/BlockFrequencyInfo.cpp
  lib/Analysis/BlockFrequencyInfoImpl.cpp
  lib/Analysis/CMakeLists.txt
  lib/CodeGen/MachineBlockFrequencyInfo.cpp
  lib/Support/BlockMass.cpp
  lib/Support/CMakeLists.txt
  lib/Support/PositiveFloat.cpp
  lib/Support/PrettyStackTrace.cpp
  test/Analysis/BlockFrequencyInfo/bad_input.ll
  test/Analysis/BlockFrequencyInfo/basic.ll
  test/Analysis/BlockFrequencyInfo/irreducible.ll
  test/Analysis/BlockFrequencyInfo/loop_with_branch.ll
  test/Analysis/BlockFrequencyInfo/nested_loop_with_branches.ll
  test/CodeGen/XCore/llvm-intrinsics.ll
  unittests/Support/BlockMassTest.cpp
  unittests/Support/CMakeLists.txt
  unittests/Support/PositiveFloatTest.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D3125.1.patch
Type: text/x-patch
Size: 160107 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140319/af9d15a6/attachment.bin>