[PATCH] blockfreq: Rewrite block frequency analysis

Mon Mar 17 16:22:40 PDT 2014

On Mar 17, 2014, at 2:54 PM, Bob Wilson <bob.wilson at apple.com> wrote:

> Overall this looks fantastic and is a huge improvement over the current implementation.

Thanks for the review!

After doing some benchmarking I’ll post a patch incorporating your feedback
(and possibly with PositiveFloat hollowed out).

> On Mar 13, 2014, at 11:04 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
>>> - Patches 0004 and 0005 introduce some supporting classes.
> 
> For patch 4, I’d like to understand better the tradeoffs in adding a new PositiveFloat data type. You have a comment in there saying that it should be a little faster than APFloat. Do you need to add PositiveFloat because of APFloat performance or is it more about the different behavior, e.g., the way that values are saturated? If performance is the main motivation, can you provide at least a rough idea of how much difference it makes? If APFloat performance is “good enough”, would it be worthwhile to just use that and avoid the maintenance cost and code size increase of adding PositiveFloat?

In my original implementation, composition from digits and exponents was
vital.  Loop scale was represented by its lg, tracked using an int16_t, and
kept separate from block mass until very late in the algorithm.  In the
final patch, this composition is much less important.

The remaining advantages are:

  - The separation of digits and exponents is still somewhat useful in tests,
    in printing debug info, and during debug sessions.  Not compelling on its
    own, though.

  - It has convenient operators.  Not compelling, since a wrapper class can
    also supply those.

  - Saturation behaviour avoids special numbers.  Only slightly compelling,
    since a wrapper class can add this behaviour when overloading operators.

  - More optimizable, since it isn’t modelling hardware behaviour.  This is
    compelling, but not quite tested.

Since the only compelling advantage is performance, I’ll do some
benchmarking.

> In patch 5, you’ve got a typo in a comment: “…teh original mass”.
> 
>>> 
>>> - Patch 0006 rewrites BlockFrequencyInfoImpl entirely.
> 
> Another comment typo here: “...data necessary to represent represent a loop…”
> 
> Could you use SmallVectors instead of std::vectors for Freqs, Working, and PackagedLoops in BlockFrequencyInfoImplBase? I’m sure it won’t matter for even medium-sized inputs, but for a small function with minimal control flow it could make a difference.

Sure thing.  Any suggestions for the number of basic blocks to optimize for?
Does SmallVector<T,16> sound right?