[PATCH] blockfreq: Rewrite block frequency analysis

Mon Mar 17 20:33:05 PDT 2014

On 2014 Mar 17, at 18:50, Chandler Carruth <chandlerc at google.com> wrote:

> 
> On Fri, Mar 7, 2014 at 3:44 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> This patch series rewrites the shared implementation of BlockFrequencyInfo
> and MachineBlockFrequencyInfo entirely.
> 
>   - Patches 0001 and 0002 are simple cleanups.
> 
> These LGTM, however I would generally prefer the formatting of clang-format (after PR17999 is fixed perhaps) to your current formatting. Especially, I really dislike the breaking after the class name in a the out-of-line nested-name-specifier of method definitions. With the formatting issues addressed, please go ahead and apply them.

Yup, I meant to clang-format, but forgot.  Will do.

>   - Patch 0003 is a class rename (and file move).  Let me know if I should
>     just merge it with 0006.
> 
> I don't really understand what the move accomplishes. Why is "InfoImp" better that "Impl"? What difference are you trying to make here? I don't think you should use the suffix of "Imp" in any case as "Impl" is more widely used in LLVM.

Not sure where you’re seeing InfoImp — is there a typo in a comment or
commit message?  The move is to BlockFrequencyInfoImpl.

BlockFrequencyInfoImpl is better than BlockFrequencyImpl because it’s
*not* an implementation of BlockFrequency (which also exists), but an
implementation of BlockFrequencyInfo.

> I hate the suffix of "Info" as it conveys no meaning at all. Perhaps there is a fundamentally better name?

I think Analysis is a better suffix from that standpoint, but it’s also
longer.  Info is also used in BranchProbabilityInfo, so if we’re going to
change one we should change both.

> 
>   - Patches 0004 and 0005 introduce some supporting classes.
> 
>   - Patch 0006 rewrites BlockFrequencyInfoImpl entirely.
> 
> I share Bob's concern about PositiveFloat -- I'm worried about introducing a second quite complex floating point class with subtle and untest differences from APFloat. This is especially true if one cannot be implemented in terms of the others.

As I mentioned in response to Bob’s review, I’ll see if there are
performance reasons to use this.  It really is much simpler than APFloat,
but a wrapper around APFloat is even simpler.

> Either BlockMass should replace BlockFrequency and be used by downstream consumers of the analysis pass, or it should be an in internal class used by the implementation of the BFI analysis and not in a public Support header. It's not clear to me which direction you're aiming at?

My first finished draft used BlockMass more extensively, with a goal to
expose the final mass (header mass times node mass) and loop scale to
downstream users.  Unfortunately, the final mass is sometimes less than
2^-64, so this turned out to be impractical.

The only benefit of having BlockMass separate now is so that it can be
unit-tested, since it has some non-trivial operations.  Is this a
worthwhile goal?

> I'm also very concerned about not re-using LoopInfo. I saw your description, but the fact that you weren't able to use LoopInfo for this seems like a critical failure in its design, and not something we should paper over. I haven't yet gotten through enough of the patch to fully understand the problem though.

Just to summarize here my reasons for not using LoopInfo, in rough order
of priority:

 1. LoopInfo has no concept of a Loop's immediate members.  Every Loop
    stores its own immediate members as well as the immediate members of
    all the loops inside of it.  Iterating through immediate members
    requires non-trivial changes to LoopInfo or non-trivial auxiliary
    data structures.  (My algorithm *only* needed this auxiliary data.)

 2. Changing LoopInfo’s storage to be based on immediate members
    seemed to require a breaking change to the API.  I had a quick look
    at updating downstream users, but I thought some might rely on the
    current behaviour.  I think it's worthwhile to come back to this
    and update LoopInfo, but it seemed out of scope.

 3. LoopInfo has O(V*V) storage.  It’s not easy to claim my algorithm
    is O(V+E) if it uses the current LoopInfo ;).

 4. It wasn’t clear exactly how LoopInfo looks with irreducible control
    flow.  I had a (low priority) goal to do something reasonable there,
    and it wasn’t clear what was possible.

It’s worth noting that the old algorithm also eschewed LoopInfo, so this
isn’t a regression.

> Essentially, I would like to review all of the last three more carefully, but honestly they are *huge* patches. Would you be OK posting them in Phabricator which makes reviewing particularly sizable new bodies of code somewhat easier?

I don’t mind.