[PATCH] blockfreq: Rewrite block frequency analysis
atrick at apple.com
Wed Mar 19 00:22:14 PDT 2014
On Mar 18, 2014, at 11:53 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> I spent some time this evening looking again at whether any sort of
> float is necessary for the algorithm. If we’re willing to use
> approximate loop scales (i.e., powers of two), we can (essentially)
> avoid the use of floats.
I think that’s fine with scales above 8. But we some pass might want to know the difference between 3 vs 5 likely iterations.
> entry -> header
> header -> exit (1/5)
> header -> a (1/5)
> header -> b (3/5)
> a -> header
> b -> header
> In that graph, the loop scale should be 5. If we use 4 (a power of
> two) instead, it’s possible to do the math without implementing a
> floating point divide. Then we have block frequencies of
> (correct => calculated):
> - entry: 1.0 => 1.0
> - header: 5.0 => 4.0
> - a: 1.0 => 0.8
> - b: 3.0 => 2.4
> - exit: 1.0 => 1.0
> An earlier iteration used this sort of fuzzy math. I hit a problem
> and thought the soft-float solution was an easy out. It’s solvable
> In more detail: the loop scale can be stored as its lg, a value by
> which to shift the mass. I originally kept the loop scales
> separate from the masses, and multiplied the masses by each other
> and the loop scales by each other when unwrapping loops (actually,
> added the stored values for loop scales, since they’re stored as
> lg). One of my goals was to expose the mass and loop scale
> downstream. However, the masses zeroed out when bootstrapping
> clang with LTO, and the register allocator had trouble with the
> resulting block frequencies. Apparently, a lot gets inlined in
> bin/opt, leading to lots of loops and lots of branching.
> I can fix it by combining the scales and masses dynamically; as the
> masses gain zeros in the upper bits, I can shift them left and take
> away from the loop scales.
> My observation was that this is halfway to a soft-float
> implementation, so I just made myself a soft-float that was easy to
> test in isolation.
Right, I would rather have an explicit PositiveFloat, with some APFloat factoring, than an implicit floating point implementation buried within BlockFrequency. That code is already overwhelmed by numeric gymnastics.
> A better idea might have been to send an RFC at that point ;).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits