[PATCH] D16381: Infrastructure to allow use of PGO in inliner

Tue Mar 8 10:41:16 PST 2016

On Tue, Mar 8, 2016 at 4:20 AM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> chandlerc added a comment.
>
> First and foremost, sorry about snapping earlier. I shouldn't have done
> that, I was frustrated and not communicating very effectively. Thanks to
> Sean and Hal and others who wrote constructive and helpful emails to get
> this back on the rails. Secondly, sorry that I've neglected this patch for
> so long. I kept prioritizing working on the actual pass manager stuff over
> it, and I should have at least written this up.
>

Apology accepted. Chandler, as a long time contributor to LLVM project, new
contributors will see you as an example to follow. I don't see the
following match what people expect to see: Ignoring review requests,
showing disagreement by keeping silence, implying people not respecting
your authority in your area, or dictating who can/can not do this or that.
Reviews takes a lot of time and energy so you should encourage more people
participate in it by helping people gain more experience and feel more
confident to approve and get things done.   In this particular case, if you
had raised any concerns in the long thread, the approval would have *not*
been given without consensus being made. Even after the commit, if you had
listed any concrete reasons why the patch should be reverted first for
further discussion, it would have been done without a blink, but you came
out with a blame first, followed by a strong demand with no reason (as if
this is my domain ....)

Sorry for the digression. Now finally to the actual review.

>
> So I do have some serious concerns with the approach. I think it is
> probably a nice way to prototype things and understand their impact (which
> is very important!) but there are several aspects of the design that seem
> to be the wrong approach. I've detailed them below.
>
> I've said a few times before, but I continue to think that while some of
> the issues I've raised below could be addressed, that may not be the best
> use of time.

Sorry, I am not following. What issues are you referring to? Minor style
issues or fundamental issues that Easwaran's approach would not work or
 there is a better approach?

> I think it would probably be more effective to work on helping with the
> pass manager effort in order to make that effort finish more rapidly. There
> is a large amount of work that is essentially unblocked right now to help
> port function passes over. All of this is very easily parallelized.
>

We have said it before that this work is designed to be orthogonal to pass
manager work -- aka, it decouples the dependency on the pass manager work
so that people can make further progress (we have a lot of performance
related projects depending on this feature, and I believe many other
members in the community have the same dependency) while pass manager work
can be in parallel.  You can not simply deny other people's contributions
using pass manager as an excuse -- if pass manager can not get in sooner, I
think the burden should be mostly on you (by acommodating more and more
features added in the old world). On the other hand,  if you need help with
pass manager migration, I would love to help with it.  You can help this by
publishing more formal design document on the new pass manager and
migration guidelines.

> Once that is done, I would work on designing a good interprocedural
> profile analysis interface with a good update API, and then teach the
> inliner to use that.

This is a good plan, but how is that in conflict with this patch?  In fact
I think this patch will actually help us getting a good API design -- as
the design in this patch is from people who are actually familiar with PGO
and how to use profile data. We don't want to design APIs from a vacuum.  I
also think Easwaran would probably also the right person to do the design
in the new pass manager (and with your help).

> I think that the process of porting the inliner to the new pass manager
> will end up being a fairly substantial change (essentially a re-write)
> which will make integrating this much cleaner and easier.
>

This is totally orthogonal. I also think you should send out a design
review on this change as soon as possible.

>
> I'm somewhat worried that in its current form, this change would actually
> make it *harder* to port the inliner and more work to introduce the correct
> long-term design.
>

Please be specific -- please do not block a major feature/progress by a
vague worry.

I don't see how it can make it 'harder' to port -- if there were any such
issues, it would be that it helped set up some baseline behavior (with test
cases) so that new passmanager re-implementation won't regress them. In my
opinion, that is absolutely a good thing to have.  The test cases are
written using specifications required by the PGO clients -- not depending
on any internal implementation so they are totally reusable.

>
> Below are detailed comments about the design issues, but also quite a few
> comments about serious API, code, and implementation issues with the patch
> as it currently stands. I think it still needs substantial work to go into
> the tree, even if the high level design concern above is addressed.
>
>
> ================
> Comment at: llvm/trunk/include/llvm/Analysis/InlineCost.h:42-55
> @@ -40,1 +41,16 @@
>
> +/// \brief Block frequency analysis for multiple functions.
> +/// This class mimics block frequency analysis on CGSCC level. Block
> frequency
> +/// info is computed on demand and cached unless they are invalidated.
> +class BlockFrequencyAnalysis {
> +private:
> +  DenseMap<Function *, BlockFrequencyInfo *> BFM;
> +
> +public:
> +  ~BlockFrequencyAnalysis();
> +  /// \brief Returns BlockFrequencyInfo for a function.
> +  BlockFrequencyInfo *getBlockFrequencyInfo(Function *);
> +  /// \brief Invalidates block frequency info for a function.
> +  void invalidateBlockFrequencyInfo(Function *);
> +};
> +
> ----------------
> This is pretty much re-implementing a tiny part of the new pass manager
> inside the inline cost analysis. That really feels like the wrong approach
> to me. Notably, this is even named exactly the same as we would want to
> name a port of BlockFrequencyInfo to the new pass manager.
>
> I don't agree with this approach. Fundamentally, this is not the right
> long-term design for how the inliner should access profile information, and
> it actually obstructs getting the right design in place.
>
>
These comments do not seem helpful.  What is the right long term design? If
you have it, we can discuss it. If you don't even have any concrete design,
then what are talking about here? Even there is a long term design, how is
this getting in its way?  What is the actual issue? Note that this patch is
designed to be a short/mid term solution until the new passmanager is
ready.  Given that how profile is going to be used by clients are well
specified and compatible across different implementations, I don't see
major problems.  If you have it, please show us with concrete examples and
we can discuss them case by case.

I will skip the rest as they are easily addressable.

thanks,

David

> ================
> Comment at: llvm/trunk/include/llvm/Analysis/InlineCost.h:151-152
> @@ -132,1 +150,4 @@
> +
> +/// \brief Return estimated count of the block \p BB.
> +Optional<uint64_t> getBlockCount(BasicBlock *BB, BlockFrequencyAnalysis
> *BFA);
>  }
> ----------------
> This seems like a surprising API to expose at the top level from the
> inline cost analysis.
>
> ================
> Comment at: llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h:30-37
> @@ -28,1 +29,10 @@
>
> +// Functor invoked when a block is cloned during inlining.
> +typedef std::function<void(const BasicBlock *, const BasicBlock *)>
> +    BlockCloningFunctor;
> +// Functor invoked when a function is inlined inside the basic block
> +// containing the call.
> +typedef std::function<void(BasicBlock *, Function *)>
> FunctionCloningFunctor;
> +// Functor invoked when a function gets deleted during inlining.
> +typedef std::function<void(Function *)> FunctionDeletedFunctor;
> +
> ----------------
> These functor typedefs are a bit confusing. First off, std::function is
> very slow. Is this overhead going to cause a compile time issue?
>
> The second two of these typedefs aren't used, please don't leave dead code
> in the patch.
>
> Lastly, having them at the global scope seems like a bad design. It seems
> like they should be part of some API instead of just being global?
>
> ================
> Comment at: llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h:82
> @@ -71,1 +81,3 @@
>    bool shouldInline(CallSite CS);
> +  /// Set the BFI of \p Dst to be the same as \p Src.
> +  void copyBlockFrequency(BasicBlock *Src, BasicBlock *Dst);
> ----------------
> I would keep vertical space between declarations. It makes the comments
> much easier to read.
>
> ================
> Comment at: llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h:102
> @@ +101,3 @@
> +  std::unique_ptr<BlockFrequencyAnalysis> BFA;
> +  /// Are we using profile guided optimization?
> +  bool HasProfileData;
> ----------------
> I would say "Indicates whether we are using profile guided optimization."
> in the comment.
>
> However, I'm surprised we need both a bool and a unique_ptr -- wouldn't
> the nullness of the pointer indicate the same thing as the bool?
>
> ================
> Comment at: llvm/trunk/include/llvm/Transforms/Utils/Cloning.h:51-52
> @@ -50,1 +50,4 @@
>
> +typedef std::function<void(const BasicBlock *, const BasicBlock *)>
> +    BlockCloningFunctor;
> +
> ----------------
> This is in essence an ODR violation -- two headers are defining the same
> typedef.
>
> ================
> Comment at: llvm/trunk/include/llvm/Transforms/Utils/Cloning.h:164
> @@ -161,1 +163,3 @@
> +                               ClonedCodeInfo *CodeInfo = nullptr,
> +                               BlockCloningFunctor Ftor = nullptr);
>
> ----------------
> Initializing a functor from nullptr is pretty surprising. I would just use
> `BlockCloningFunctor Ftor = BlockCloningFunctor()` instead.
>
> However, I think this is an indication of a larger design issue. I think
> the fact that you need to thread this cloning functor between very
> disparate layers of the code isn't good. I feel like the cloning layer
> should really be taught to directly support updating profile information
> when cloning. This of course may not be terrible reasonable to implement
> currently, as you have the inliner essentially maintaining a mini form of
> the pass manager internally and need to thread some way to update it
> through these layers. If that's the problem, I think it really is
> indicating that this needs to wait until we have the pass manager fixed so
> that we can layer this logic more cleanly.
>
> ================
> Comment at: llvm/trunk/lib/Analysis/InlineCost.cpp:585-588
> @@ -574,1 +584,6 @@
>
> +// Adjust the threshold based on callsite hotness. Currently this is a
> nop.
> +int CallAnalyzer::getAdjustedThreshold(int Threshold,
> +                                       Optional<uint64_t> CallSiteCount
> +                                       __attribute__((unused))) {
> +  // FIXME: The new threshold should be computed from the given Threshold
> and
> ----------------
> It's a bit odd that this isn't a doxygen comment. Also that this is marked
> as "unused". What's the plan here?
>
> ================
> Comment at: llvm/trunk/lib/Analysis/InlineCost.cpp:618-620
> @@ -598,2 +617,5 @@
>    }
> +  Optional<uint64_t> CallSiteCount =
> +      llvm::getBlockCount(CS.getInstruction()->getParent(), BFA);
> +  Threshold = getAdjustedThreshold(Threshold, CallSiteCount);
>
> ----------------
> Separating this out into two functions doesn't make a lot of sense here.
> You now have two parts of the code handling profile based inlining
> adjustments.
>
> I think you should keep all of the profile based adjustments inline here,
> and integrate it with the function count logic above.
>
> ================
> Comment at: llvm/trunk/lib/Analysis/InlineCost.cpp:1570-1576
> @@ +1569,9 @@
> +  Function *F = BB->getParent();
> +  Optional<uint64_t> EntryCount = F->getEntryCount();
> +  if (!EntryCount)
> +    return None;
> +  BlockFrequencyInfo *BFI = BFA->getBlockFrequencyInfo(F);
> +  uint64_t BBFreq = BFI->getBlockFreq(BB).getFrequency();
> +  uint64_t FunctionEntryFreq = BFI->getEntryFreq();
> +  uint64_t BBCount = EntryCount.getValue() * BBFreq / FunctionEntryFreq;
> +  return BBCount;
> ----------------
> So, inside this routine, you're actually implementing a proper
> interprocedural profile analysis. I really think this needs to be extracted
> to provide a high level analysis API around interprocedurally weighted
> profile information.
>
> ================
> Comment at: llvm/trunk/lib/Analysis/InlineCost.cpp:1591-1595
> @@ +1590,7 @@
> +    return Iter->second;
> +  // We need to create a BlockFrequencyInfo object for F and store it.
> +  DominatorTree DT;
> +  DT.recalculate(*F);
> +  LoopInfo LI(DT);
> +  BranchProbabilityInfo BPI(*F, LI);
> +  BlockFrequencyInfo *BFI = new BlockFrequencyInfo(*F, BPI, LI);
> ----------------
> While it happens that BPI doesn't keep any references or pointers back
> into the dominator tree or loop info, the APIs of these analyses actually
> would make that OK to do... Which makes this code a rather subtle mini
> implementation of the core logic of the pass manager but inside the inline
> cost analysis. That worries me some. At the very least it would need a lot
> of documentation to make it clear exactly what was going on and why this
> was a safe thing to do.
>
> ================
> Comment at: llvm/trunk/lib/Transforms/IPO/Inliner.cpp:376-380
> @@ +375,7 @@
> +      BFA->getBlockFrequencyInfo(CallBB->getParent());
> +  // Find the number of times OrigBB is executed per invocation of the
> callee
> +  // and multiply by the number of times callee is executed in the caller.
> +  // Freq(NewBB) = Freq(OrigBB) * CallSiteFreq / CalleeEntryFreq.
> +  uint64_t CallSiteFreq = CallerBFI->getBlockFreq(CallBB).getFrequency();
> +  uint64_t CalleeEntryFreq = CalleeBFI->getEntryFreq();
> +  // Frequency of OrigBB in the callee.
> ----------------
> You're dividing by CalleeEntryFreq because the original value was scaled
> up by that. This essentially strips some precision off. It would seem
> better to instead directly access the unscaled block frequency for the old
> BB, and then scale it correctly for the post-inlined state.
>
> ================
> Comment at: llvm/trunk/lib/Transforms/IPO/Inliner.cpp:383-384
> @@ +382,4 @@
> +  BlockFrequency OrigBBFreq = CalleeBFI->getBlockFreq(OrigBB);
> +  CallerBFI->setBlockFreq(NewBB, (double)(OrigBBFreq.getFrequency()) /
> +                                     CalleeEntryFreq * CallSiteFreq);
> +}
> ----------------
> We have worked very hard throughout the PGO analysis infrastructure to not
> rely on actual hardware floating point. We have a broad collection of
> carefully written math utilities for this. One part of factoring this logic
> into a real IPO profile update routine would be to use the same facilities
> and infrastructure that the existing profile (and profile update) analyses
> use for this purpose.
>
> ================
> Comment at: llvm/trunk/lib/Transforms/IPO/Inliner.cpp:570
> @@ +569,3 @@
> +        if (HasProfileData)
> +          BCF = std::bind(&Inliner::updateBlockFreq, this, CS, _1, _2);
> +        InlineFunctionInfo InlineInfo(&CG, ACT, BCF);
> ----------------
> Please use lambdas instead of std::bind -- they make the code much more
> readable as it is easier to understand how they behave. It also avoids the
> pain of using placeholder values.
>
> ================
> Comment at: llvm/trunk/lib/Transforms/IPO/Inliner.cpp:759
> @@ +758,3 @@
> +    Function *F = CG.removeFunctionFromModule(CGN);
> +    invalidateBFI(F);
> +    delete F;
> ----------------
> I'm really concerned by the need to manually invalidate BFI in so many
> places. I think this is going to prove to be a very error prone pattern to
> maintain long term.
>
>
> Repository:
>   rL LLVM
>
> http://reviews.llvm.org/D16381
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160308/bd68615b/attachment.html>