[PATCH] blockfreq: Rewrite block frequency analysis
Duncan P. N. Exon Smith
dexonsmith at apple.com
Thu Mar 13 23:04:20 PDT 2014
*ping*
If someone is planning to take a look pre-commit, let me know.
Otherwise, I’ll commit away on Monday (and you can post-commit
review).
On 2014 Mar 7, at 15:44, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> This patch series rewrites the shared implementation of BlockFrequencyInfo
> and MachineBlockFrequencyInfo entirely.
>
> - Patches 0001 and 0002 are simple cleanups.
>
> - Patch 0003 is a class rename (and file move). Let me know if I should
> just merge it with 0006.
>
> - Patches 0004 and 0005 introduce some supporting classes.
>
> - Patch 0006 rewrites BlockFrequencyInfoImpl entirely.
>
> - Patch 0007 is *not* included (it’s headed separately to llvmdev).
>
> The old implementation had a fundamental flaw: precision losses from
> nested loops (and very wide branches) compounded past loop exits (and
> convergence points).
>
> The @nested_loops testcase at the end of
> test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This
> function has three nested loops, with branch weights in the loop headers
> of 1:4000 (exit:continue). The old analysis gives nonsense:
>
> Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
> ---- Block Freqs ----
> entry = 1.0
> for.cond1.preheader = 1.00103
> for.cond4.preheader = 5.5222
> for.body6 = 18095.19995
> for.inc8 = 4.52264
> for.inc11 = 0.00109
> for.end13 = 0.0
>
> The new analysis gives correct results:
>
> Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
> block-frequency-info: nested_loops
> - entry: float = 1.0, int = 8
> - for.cond1.preheader: float = 4001.0, int = 32007
> - for.cond4.preheader: float = 16008001.0, int = 128064007
> - for.body6: float = 64048012001.0, int = 512384096007
> - for.inc8: float = 16008001.0, int = 128064007
> - for.inc11: float = 4001.0, int = 32007
> - for.end13: float = 1.0, int = 8
>
> The new algorithm leverages BlockMass and PositiveFloat to maintain
> precision, separates "probability mass distribution" from "loop
> scaling", and uses dithering to eliminate probability mass loss.
>
> Additionally, the new algorithm has a complexity advantage over the old.
> The previous algorithm was quadratic in the worst case. The new
> algorithm is still worst-case quadratic in the presence of irreducible
> control flow, but it's linear otherwise.
>
> The key difference between the old algorithm and the new is that control
> flow within a loop is evaluated separately from control flow outside,
> limiting propagation of precision problems and allowing loop scale to be
> calculated independently of mass distribution. Loops are visited
> bottom-up, their loop scales are calculated, and they are replaced by
> pseudo-nodes. Mass is then distributed through the function, which is
> now a DAG. Finally, loops are revisited top-down to multiply through
> the loop scales and the masses distributed to pseudo nodes.
>
> There are some remaining flaws.
>
> - Irreducible control flow isn't ignored, but it also isn't modelled
> correctly. Nevertheless, the new algorithm should behave better
> than the old algorithm (at least it evaluates irreducible edges),
> but mileage may vary.
>
> - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
> the 64-bit integer precision used downstream. If downstream users
> of BlockFrequencyInfo are updated to use PositiveFloat (instead of
> BlockFrequency), this limitation can be removed. It's not clear if
> that's practical.
>
> - BlockFrequencyInfo does *not* leverage LoopInfo/MachineLoopInfo and
> Loop/MachineLoop. These are currently unsuitable because they use
> quadratic storage and don't have the API needed to make this
> algorithm efficient. I looked into updating them, but downstream
> users rely on the current API.
>
> - The "bias" calculation proposed recently on llvmdev is *not*
> incorporated here. A separate patch (0007) takes a stab at that;
> I’ll post it to llvmdev soon.
>
> I ran through the LNT benchmarks; there was some movement in both
> directions.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-blockfreq-Use-const-in-MachineBlockFrequencyInfo.patch
Type: application/octet-stream
Size: 5502 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-blockfreq-Implement-Pass-releaseMemory.patch
Type: application/octet-stream
Size: 7880 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-blockfreq-Rename-class-to-BlockFrequencyInfoImpl.patch
Type: application/octet-stream
Size: 29296 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-blockfreq-Add-PositiveFloat-class.patch
Type: application/octet-stream
Size: 62887 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-blockfreq-Add-BlockMass.patch
Type: application/octet-stream
Size: 17245 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-blockfreq-Rewrite-BlockFrequencyInfoImpl.patch
Type: application/octet-stream
Size: 84868 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140313/8ec84119/attachment-0005.obj>
More information about the llvm-commits
mailing list