[PATCH] D28331: Improve PGO support for the new inliner

Wed Jan 4 20:28:28 PST 2017

davidxl added inline comments.

================
Comment at: include/llvm/Transforms/Utils/Cloning.h:187
+          nullptr,
+      std::function<BlockFrequencyInfo &(Function &)> *GetBFI = nullptr)
+      : CG(cg), GetAssumptionCache(GetAssumptionCache), GetBFI(GetBFI) {}
----------------
chandlerc wrote:
> My brain is probably just being a bit dense, but is there no way to just give an (optional) BFI (or pair of BFIs) directly? This doesn't need to query recursively the way inline cost does?
This sounds right. IFI probably just needs a pair of BFIs.

================
Comment at: lib/Analysis/InlineCost.cpp:649-654
   bool HotCallsite = false;
   uint64_t TotalWeight;
   if (PSI && CS.getInstruction()->extractProfTotalWeight(TotalWeight) &&
       PSI->isHotCount(TotalWeight)) {
     HotCallsite = true;
   }
----------------
chandlerc wrote:
> This code is now very confusing combined with the below code. What is it that determines whether a callsite is hot?
What the following code says is that if callsite hotness info is available, do not use callee-entry based hotness hint.

This needs to be documented. 

================
Comment at: lib/Transforms/Utils/InlineFunction.cpp:1413-1424
+    // Use 128 bits APInt to avoid overflow.
+    APInt CallSiteFreq(128,
+                       CallerBFI.getBlockFreq(CallSiteBlock).getFrequency());
+    APInt CalleeEntryFreq(128, CalleeBFI.getEntryFreq());
+    APInt BBFreq(128, CalleeBFI.getBlockFreq(Entry.first).getFrequency());
+
+    // Multiply first by CallSiteFreq and then divide by CalleeEntryFreq
----------------
chandlerc wrote:
> I feel like much of this logic should really live in BFI next to the other logic for updating and maintaining these values rather than buried in the inliner.
> 
> In particular, it seems like there should be generic logic for taking a collection of basic blocks with existing frequencies and scaling them based on some other block's frequencies.
> 
> The same core logic seems like it should be usable by things like loop unrolling, loop splitting, etc where they clone a region of blocks that have existing *relative* frequencies but are suddenly below some dominating predicate that needs to scale those frequencies.
> 
> Then the only thing happening here is the cloning of (raw, unscaled) frequencies from the callee into the inlined blocks in the caller. That cloning logic makes sense here as it has everything to do with the inlining step and nothing to do with the particular values being cloned.
I agree this probably belongs to BFI. Besides IFI is not even used here.

https://reviews.llvm.org/D28331