[PATCH] blockfreq: Rewrite block frequency analysis

Wed Mar 19 00:22:14 PDT 2014

On Mar 18, 2014, at 11:53 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:

> I spent some time this evening looking again at whether any sort of
> float is necessary for the algorithm.  If we’re willing to use
> approximate loop scales (i.e., powers of two), we can (essentially)
> avoid the use of floats.

I think that’s fine with scales above 8. But we some pass might want to know the difference between 3 vs 5 likely iterations.

> E.g.,
> 
>    entry -> header
>    header -> exit (1/5)
>    header -> a (1/5)
>    header -> b (3/5)
>    a -> header
>    b -> header
> 
> In that graph, the loop scale should be 5.  If we use 4 (a power of
> two) instead, it’s possible to do the math without implementing a
> floating point divide.  Then we have block frequencies of
> (correct => calculated):
> 
>  - entry:  1.0 => 1.0
>  - header: 5.0 => 4.0
>  - a:      1.0 => 0.8
>  - b:      3.0 => 2.4
>  - exit:   1.0 => 1.0
> 
> An earlier iteration used this sort of fuzzy math.  I hit a problem
> and thought the soft-float solution was an easy out.  It’s solvable
> though.
> 
> ===
> 
> In more detail:  the loop scale can be stored as its lg, a value by
> which to shift the mass.  I originally kept the loop scales
> separate from the masses, and multiplied the masses by each other
> and the loop scales by each other when unwrapping loops (actually,
> added the stored values for loop scales, since they’re stored as
> lg).  One of my goals was to expose the mass and loop scale
> downstream.  However, the masses zeroed out when bootstrapping
> clang with LTO, and the register allocator had trouble with the
> resulting block frequencies.  Apparently, a lot gets inlined in
> bin/opt, leading to lots of loops and lots of branching.
> 
> I can fix it by combining the scales and masses dynamically; as the
> masses gain zeros in the upper bits, I can shift them left and take
> away from the loop scales.
> 
> My observation was that this is halfway to a soft-float
> implementation, so I just made myself a soft-float that was easy to
> test in isolation.

Right, I would rather have an explicit PositiveFloat, with some APFloat factoring, than an implicit floating point implementation buried within BlockFrequency. That code is already overwhelmed by numeric gymnastics.

-Andy

> 
> A better idea might have been to send an RFC at that point ;).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140319/337caf54/attachment.html>