[PATCH] D28331: Improve PGO support for the new inliner

Easwaran Raman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 19 16:29:00 PST 2017


eraman added inline comments.


================
Comment at: lib/Analysis/InlineCost.cpp:52-62
+static cl::opt<int>
+    ColdCallSiteThreshold("inline-cold-callsite-threshold", cl::Hidden,
+                          cl::init(45),
+                          cl::desc("Threshold for inlining cold callsites"));
+
 // We introduce this threshold to help performance of instrumentation based
 // PGO before we actually hook up inliner with analysis passes such as BPI and
----------------
chandlerc wrote:
> Is there any way to avoid introducing yet another tunable here? It feels weird to have two cold tunables, and more weird for them to be different.
> 
> Also, I would expect the threshold here to be *very* similar to the -Oz inlining threshold. Should they be the same? Or at least documented together?
Agreed.  We likely want to have one threshold for Oz, cold callsite and cold callee. I'll get the numbers and send the threshold change separately.


================
Comment at: lib/Transforms/Utils/InlineFunction.cpp:1408-1409
+  SmallPtrSet<BasicBlock *, 16> ClonedBBs;
+  for (auto &OrigBB : *Callee) {
+    BasicBlock *ClonedBB = cast_or_null<BasicBlock>(VMap[&OrigBB]);
+    if (!ClonedBB)
----------------
chandlerc wrote:
> I was hoping we could do still better by walking the inlined region in the caller so we would *only* touch cloned blocks. However, there isn't a map from cloned BB to original BB, which is why you can't do that. And why you wrote the code you did. Sorry for the confusion.
> 
> Unfortunately, I think walking the entire map is better than walking the entire callee as the map only contains the inlined part. Also, in all of the other code here we walk the map to find relatively sparse and infrequent things so it seems fine. Like I said, sorry for sending you down a wrong path, I hadn't thought about the lack of clone -> original BB.
But the size of the value map is O(|instructions in the cloned region|) and not O(|BBs in the cloned region|). My thinking was O(|instructions in the cloned region|) likely to be greater than O(|BBs in the original callee|) and that's why you preferred this. 


https://reviews.llvm.org/D28331





More information about the llvm-commits mailing list