[PATCH] D28331: Improve PGO support for the new inliner

Wed Jan 4 19:05:25 PST 2017

chandlerc added a comment.

Some quick initial comments, thanks!

================
Comment at: include/llvm/Analysis/InlineCost.h:178
               std::function<AssumptionCache &(Function &)> &GetAssumptionCache,
+              std::function<BlockFrequencyInfo &(Function &)> *GetBFI,
               ProfileSummaryInfo *PSI);
----------------
I would use llvm's function_ref rather than std::function -- it is much lighter weight and the caller can ensure that the function object outlives the call.

Also, I would prefer `Optional<function_ref<...>>` over `function_ref<...>*` as the pointer seems a more error prone API.

================
Comment at: include/llvm/Transforms/Utils/Cloning.h:187
+          nullptr,
+      std::function<BlockFrequencyInfo &(Function &)> *GetBFI = nullptr)
+      : CG(cg), GetAssumptionCache(GetAssumptionCache), GetBFI(GetBFI) {}
----------------
My brain is probably just being a bit dense, but is there no way to just give an (optional) BFI (or pair of BFIs) directly? This doesn't need to query recursively the way inline cost does?

================
Comment at: lib/Analysis/InlineCost.cpp:649-654
   bool HotCallsite = false;
   uint64_t TotalWeight;
   if (PSI && CS.getInstruction()->extractProfTotalWeight(TotalWeight) &&
       PSI->isHotCount(TotalWeight)) {
     HotCallsite = true;
   }
----------------
This code is now very confusing combined with the below code. What is it that determines whether a callsite is hot?

================
Comment at: lib/Analysis/InlineCost.cpp:657-662
   // Listen to the inlinehint attribute or profile based hotness information
   // when it would increase the threshold and the caller does not need to
   // minimize its size.
-  bool InlineHint = Callee.hasFnAttribute(Attribute::InlineHint) ||
-                    (PSI && PSI->isFunctionEntryHot(&Callee));
+  bool InlineHint =
+      Callee.hasFnAttribute(Attribute::InlineHint) ||
+      (!UseCallsiteHotness && PSI && PSI->isFunctionEntryHot(&Callee));
----------------
It seems problematic to couple these hints in this way, and I think that results in the need to use a predicate here.

What about first a small refactoring patch that just separates the PSI-based threshold adjustment from InlineHint, and then this code can add a preferred path instead of the PSI-exclusive one when we have BFI available?

================
Comment at: lib/Analysis/InlineCost.cpp:675
     Threshold = MinIfValid(Threshold, Params.ColdThreshold);
+  if (UseCallsiteHotness && PSI && CallBB) {
+    auto &CallerBFI = (*GetBFI)(*Caller);
----------------
Why would CallBB ever be null?

================
Comment at: lib/Analysis/InlineCost.cpp:681
+            << "     Callsite count =  " << CallsiteCount.getValue()
+            << (PSI->isHotCount(CallsiteCount.getValue()) ? " (HOT)\n" : "\n"));
+      if (PSI->isHotCount(CallsiteCount.getValue()))
----------------
You don't print cold here, just hot?

================
Comment at: lib/Transforms/Utils/InlineFunction.cpp:1413-1424
+    // Use 128 bits APInt to avoid overflow.
+    APInt CallSiteFreq(128,
+                       CallerBFI.getBlockFreq(CallSiteBlock).getFrequency());
+    APInt CalleeEntryFreq(128, CalleeBFI.getEntryFreq());
+    APInt BBFreq(128, CalleeBFI.getBlockFreq(Entry.first).getFrequency());
+
+    // Multiply first by CallSiteFreq and then divide by CalleeEntryFreq
----------------
I feel like much of this logic should really live in BFI next to the other logic for updating and maintaining these values rather than buried in the inliner.

In particular, it seems like there should be generic logic for taking a collection of basic blocks with existing frequencies and scaling them based on some other block's frequencies.

The same core logic seems like it should be usable by things like loop unrolling, loop splitting, etc where they clone a region of blocks that have existing *relative* frequencies but are suddenly below some dominating predicate that needs to scale those frequencies.

Then the only thing happening here is the cloning of (raw, unscaled) frequencies from the callee into the inlined blocks in the caller. That cloning logic makes sense here as it has everything to do with the inlining step and nothing to do with the particular values being cloned.

================
Comment at: lib/Transforms/Utils/InlineFunction.cpp:1444-1449
+  // Since CallSiteCount is an estimate, it could exceed the original callee
+  // count and has to be set to 0.
+  if (CallSiteCount.getValue() > CalleeCount.getValue())
+    Callee->setEntryCount(0);
+  else
+    Callee->setEntryCount(CalleeCount.getValue() - CallSiteCount.getValue());
----------------
This is why I intensely dislike `uint64_t`... If these were `int64_t`, then this would just be `std::max(0, ...)`.

Anyways, not something to change in this patch.

================
Comment at: lib/Transforms/Utils/InlineFunction.cpp:1590-1591
   ClonedCodeInfo InlinedFunctionInfo;
+  InlinedFunctionInfo.ClonedBlocks.reset(
+      new DenseMap<const BasicBlock *, BasicBlock *>());
   Function::iterator FirstNewBlock;
----------------
Why do we need a dedicated cloned blocks map rather than re-using the VMap below? That is the map the cloning routine uses to map between basic blocks IIRC, so it already seems like it would have the information we need.

https://reviews.llvm.org/D28331