[PATCH] D24226: [PM] Provide an initial, minimal port of the inliner to the new pass manager.

Fri Dec 16 02:22:04 PST 2016

chandlerc added a comment.

Patch updated and responses below. Some discussion here so will let you give a final OK before I land. Also would like to get an OK from Easwaran if possible although he seemed pretty happy on the last iteration.

================
Comment at: lib/Transforms/IPO/Inliner.cpp:865
+        // To check this we also need to nuke any dead constant uses (perhaps
+        // made dead by this operation on other functions).
+        Callee.removeDeadConstantUsers();
----------------
sanjoy wrote:
> This bit needs a unit test -- nothing in `make check` failed when I commented this out.
Yea, deleting the analogous line above also doesn't cause anything to fail.

:: sigh ::

================
Comment at: lib/Transforms/IPO/Inliner.cpp:783
+  assert(InitialC.size() > 0 && "Cannot handle an empty SCC!");
+  Module &M = *InitialC.begin()->getFunction().getParent();
+  ProfileSummaryInfo *PSI = MAM.getCachedResult<ProfileSummaryAnalysis>(M);
----------------
sanjoy wrote:
> Super minor, but it seems natural to have a `SCC::getModule()`.  Maybe in a later change?
Will do.

================
Comment at: lib/Transforms/IPO/Inliner.cpp:808
+  // also need to use an updatable pointer for the SCC as a consequence.
+  SmallVector<LazyCallGraph::Node *, 16> Nodes;
+  for (auto &N : InitialC)
----------------
sanjoy wrote:
> Why not `SmallVector<LazyCallGraph::Node *, 16> Nodes(InitialC.begin(), InitialC.end());`?
Because it doesn't compile -- we need pointers not references.

Eventially with a range constructor and an adaptor this will work, but today the current code seemed like the least typing. =[

================
Comment at: lib/Transforms/IPO/Inliner.cpp:853
+    while (!Calls.empty()) {
+      CallSite CS = Calls.pop_back_val();
+      Function &Callee = *CS.getCalledFunction();
----------------
sanjoy wrote:
> How about s/`Calls`/`CallsInCurrentNode`?  Given the nested loops, it is a little hard to keep track of whether `Calls` has all the call sites in the SCC or just the ones in the current node.
I'm not sure about this... Seems a long name when we don't have two of them. I've cleaned up the comment a bit.

Also, the per-node loop is the outer loop. The only reason this isn't scoped to the per-node region is to avoid re-allocation.

================
Comment at: lib/Transforms/IPO/Inliner.cpp:868
+        if (Function *NewCallee = CS.getCalledFunction())
+          if (!NewCallee->isDeclaration())
+            Calls.push_back(CS);
----------------
sanjoy wrote:
> (This is minor, please don't hold this patch over resolving this.  We can add this (or not) in a later change.)
> 
> I'd rather add to `Calls` if either `CS` is indirect OR is direct to a function with a body, since a later inline can make that call direct.
> 
> That way, we get this case:
> 
> ```
> fnptr_t f() { return printf; }
> void g(fnptr_t val) {
>   val("");
> }
> void main() {
>   fnptr_t val = f();
>   g(val);
> }
> ```
> 
> If we're lucky enough to first inline `g` and then `f`, we want to consider inlining the call to `val` that came in from `g` (that is now directly, after inlining `f`).
> 
> Of course, we have to be somewhat smarter to always get this case, but the probability is higher if we reconsider indirect calls.
Yeah, this is really similar to the devirtualization cases already. I think I'll do this in a follow-up to make it more focused and get more testing (and because I have a few other things I'd like to get moving).

Generally, I want to pursue a strategy of optimistic iteration here rather than exhaustive. Because we're going to end up with an iteration construct around all of this to handle cross-pass cases anyways, and we can rely on that handling hard to spot cases. The focus here should be to quickly find as my opportunities as we can without too much waste.

For example, the current inliner does another scan of the entire function every time anything gets inlined. This basically doubles the cost as we do a full run of computing inline cost and not inlining anything. Then we pop out to the devirt iteration and in some cases re-run it ... again....

I have a good idea of how to handle the majority of easy cases here though by looking at the users of inlined calls when they are inlined. But it'll require a bit of work and threading things through as well as more test cases so i'd like to circle back to it.

================
Comment at: lib/Transforms/IPO/Inliner.cpp:915
+      if (InlinedCallee->hasLocalLinkage() && InlinedCallee->use_empty())
+        DeadFunctions.insert(InlinedCallee);
+    InlinedCallees.clear();
----------------
sanjoy wrote:
> We'll only ever try to insert a function once into `DeadFunctions`, right?  Since if we're trying to insert `InlinedCallee` into `DeadFunctions` then it has no uses, and we could not possible try to inline through a call to it again later, so it should never appear again in `InlinedCallees`.
> 
> If that ^ is correct, perhaps we can add an assert here (and we could even use a `SmallVector` instead of `SmallPtrSet`, but I'd especially like to have the assert in that case).
All of this stemmed from an intermediate state where I tried to actually delete the function here.

Now that it is fully deferred, I can directly put it in the queue above when we discover it. This avoids the duplicated predicate and this extra loop, etc.

I've switched to a vector and added the assert because yea, we can't inline the same call twice and have it become dead both times. =] At least, not unless there is some really cool reincarnation going on here...

https://reviews.llvm.org/D24226