[PATCH] blockfreq: Rewrite block frequency analysis

Mon Mar 17 14:54:39 PDT 2014

Overall this looks fantastic and is a huge improvement over the current implementation.

On Mar 13, 2014, at 11:04 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:

> *ping*
> 
> If someone is planning to take a look pre-commit, let me know.
> Otherwise, I’ll commit away on Monday (and you can post-commit
> review).
> 
> On 2014 Mar 7, at 15:44, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
>> This patch series rewrites the shared implementation of BlockFrequencyInfo
>> and MachineBlockFrequencyInfo entirely.
>> 
>> - Patches 0001 and 0002 are simple cleanups.

These are fine.

>> 
>> - Patch 0003 is a class rename (and file move).  Let me know if I should
>>   just merge it with 0006.

I don’t see the need to have a separate patch (#3) to move the files when you’re just going to entirely replace the contents later (patch #6). It seems to me like you might as well just do that in one step. I.E., remove the old files and add the new ones in a single patch. (It would also make patch #6 easier to read!)

>> 
>> - Patches 0004 and 0005 introduce some supporting classes.

For patch 4, I’d like to understand better the tradeoffs in adding a new PositiveFloat data type. You have a comment in there saying that it should be a little faster than APFloat. Do you need to add PositiveFloat because of APFloat performance or is it more about the different behavior, e.g., the way that values are saturated? If performance is the main motivation, can you provide at least a rough idea of how much difference it makes? If APFloat performance is “good enough”, would it be worthwhile to just use that and avoid the maintenance cost and code size increase of adding PositiveFloat?

If we really do need to add PositiveFloat, I’d like to see some comments in the PositiveFloatBase class. Some of the functions there are obvious (e.g., countLeadingZeros32) but others really are not (e.g., extractLg).

In patch 5, you’ve got a typo in a comment: “…teh original mass”.

>> 
>> - Patch 0006 rewrites BlockFrequencyInfoImpl entirely.

Another comment typo here: “...data necessary to represent represent a loop…”

Could you use SmallVectors instead of std::vectors for Freqs, Working, and PackagedLoops in BlockFrequencyInfoImplBase? I’m sure it won’t matter for even medium-sized inputs, but for a small function with minimal control flow it could make a difference.

>> 
>> - Patch 0007 is *not* included (it’s headed separately to llvmdev).
>> 
>> The old implementation had a fundamental flaw:  precision losses from
>> nested loops (and very wide branches) compounded past loop exits (and
>> convergence points).
>> 
>> The @nested_loops testcase at the end of
>> test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
>> function has three nested loops, with branch weights in the loop headers
>> of 1:4000 (exit:continue).  The old analysis gives nonsense:
>> 
>>   Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
>>   ---- Block Freqs ----
>>    entry = 1.0
>>    for.cond1.preheader = 1.00103
>>    for.cond4.preheader = 5.5222
>>    for.body6 = 18095.19995
>>    for.inc8 = 4.52264
>>    for.inc11 = 0.00109
>>    for.end13 = 0.0
>> 
>> The new analysis gives correct results:
>> 
>>   Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
>>   block-frequency-info: nested_loops
>>    - entry: float = 1.0, int = 8
>>    - for.cond1.preheader: float = 4001.0, int = 32007
>>    - for.cond4.preheader: float = 16008001.0, int = 128064007
>>    - for.body6: float = 64048012001.0, int = 512384096007
>>    - for.inc8: float = 16008001.0, int = 128064007
>>    - for.inc11: float = 4001.0, int = 32007
>>    - for.end13: float = 1.0, int = 8
>> 
>> The new algorithm leverages BlockMass and PositiveFloat to maintain
>> precision, separates "probability mass distribution" from "loop
>> scaling", and uses dithering to eliminate probability mass loss.
>> 
>> Additionally, the new algorithm has a complexity advantage over the old.
>> The previous algorithm was quadratic in the worst case.  The new
>> algorithm is still worst-case quadratic in the presence of irreducible
>> control flow, but it's linear otherwise.
>> 
>> The key difference between the old algorithm and the new is that control
>> flow within a loop is evaluated separately from control flow outside,
>> limiting propagation of precision problems and allowing loop scale to be
>> calculated independently of mass distribution.  Loops are visited
>> bottom-up, their loop scales are calculated, and they are replaced by
>> pseudo-nodes.  Mass is then distributed through the function, which is
>> now a DAG.  Finally, loops are revisited top-down to multiply through
>> the loop scales and the masses distributed to pseudo nodes.
>> 
>> There are some remaining flaws.
>> 
>> - Irreducible control flow isn't ignored, but it also isn't modelled
>>   correctly.  Nevertheless, the new algorithm should behave better
>>   than the old algorithm (at least it evaluates irreducible edges),
>>   but mileage may vary.
>> 
>> - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
>>   the 64-bit integer precision used downstream.  If downstream users
>>   of BlockFrequencyInfo are updated to use PositiveFloat (instead of
>>   BlockFrequency), this limitation can be removed.  It's not clear if
>>   that's practical.
>> 
>> - BlockFrequencyInfo does *not* leverage LoopInfo/MachineLoopInfo and
>>   Loop/MachineLoop.  These are currently unsuitable because they use
>>   quadratic storage and don't have the API needed to make this
>>   algorithm efficient.  I looked into updating them, but downstream
>>   users rely on the current API.
>> 
>> - The "bias" calculation proposed recently on llvmdev is *not*
>>   incorporated here.  A separate patch (0007) takes a stab at that;
>>   I’ll post it to llvmdev soon.
>> 
>> I ran through the LNT benchmarks; there was some movement in both
>> directions.
> 
> <0001-blockfreq-Use-const-in-MachineBlockFrequencyInfo.patch><0002-blockfreq-Implement-Pass-releaseMemory.patch><0003-blockfreq-Rename-class-to-BlockFrequencyInfoImpl.patch><0004-blockfreq-Add-PositiveFloat-class.patch><0005-blockfreq-Add-BlockMass.patch><0006-blockfreq-Rewrite-BlockFrequencyInfoImpl.patch>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits