[PATCH] blockfreq: Rewrite block frequency analysis
Duncan P. N. Exon Smith
dexonsmith at apple.com
Fri Mar 7 15:44:21 PST 2014
This patch series rewrites the shared implementation of BlockFrequencyInfo
and MachineBlockFrequencyInfo entirely.
- Patches 0001 and 0002 are simple cleanups.
- Patch 0003 is a class rename (and file move). Let me know if I should
just merge it with 0006.
- Patches 0004 and 0005 introduce some supporting classes.
- Patch 0006 rewrites BlockFrequencyInfoImpl entirely.
- Patch 0007 is *not* included (it’s headed separately to llvmdev).
The old implementation had a fundamental flaw: precision losses from
nested loops (and very wide branches) compounded past loop exits (and
convergence points).
The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue). The old analysis gives nonsense:
Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
---- Block Freqs ----
entry = 1.0
for.cond1.preheader = 1.00103
for.cond4.preheader = 5.5222
for.body6 = 18095.19995
for.inc8 = 4.52264
for.inc11 = 0.00109
for.end13 = 0.0
The new analysis gives correct results:
Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
block-frequency-info: nested_loops
- entry: float = 1.0, int = 8
- for.cond1.preheader: float = 4001.0, int = 32007
- for.cond4.preheader: float = 16008001.0, int = 128064007
- for.body6: float = 64048012001.0, int = 512384096007
- for.inc8: float = 16008001.0, int = 128064007
- for.inc11: float = 4001.0, int = 32007
- for.end13: float = 1.0, int = 8
The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.
Additionally, the new algorithm has a complexity advantage over the old.
The previous algorithm was quadratic in the worst case. The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear otherwise.
The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution. Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes. Mass is then distributed through the function, which is
now a DAG. Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.
There are some remaining flaws.
- Irreducible control flow isn't ignored, but it also isn't modelled
correctly. Nevertheless, the new algorithm should behave better
than the old algorithm (at least it evaluates irreducible edges),
but mileage may vary.
- Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
the 64-bit integer precision used downstream. If downstream users
of BlockFrequencyInfo are updated to use PositiveFloat (instead of
BlockFrequency), this limitation can be removed. It's not clear if
that's practical.
- BlockFrequencyInfo does *not* leverage LoopInfo/MachineLoopInfo and
Loop/MachineLoop. These are currently unsuitable because they use
quadratic storage and don't have the API needed to make this
algorithm efficient. I looked into updating them, but downstream
users rely on the current API.
- The "bias" calculation proposed recently on llvmdev is *not*
incorporated here. A separate patch (0007) takes a stab at that;
I’ll post it to llvmdev soon.
I ran through the LNT benchmarks; there was some movement in both
directions.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-blockfreq-Use-const-in-MachineBlockFrequencyInfo.patch
Type: application/octet-stream
Size: 5502 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-blockfreq-Implement-Pass-releaseMemory.patch
Type: application/octet-stream
Size: 7880 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-blockfreq-Rename-class-to-BlockFrequencyInfoImpl.patch
Type: application/octet-stream
Size: 29296 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-blockfreq-Add-PositiveFloat-class.patch
Type: application/octet-stream
Size: 62887 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-blockfreq-Add-BlockMass.patch
Type: application/octet-stream
Size: 17245 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-blockfreq-Rewrite-BlockFrequencyInfoImpl.patch
Type: application/octet-stream
Size: 84868 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140307/778b1bf1/attachment-0005.obj>
More information about the llvm-commits
mailing list