[PATCH] D33946: [InlineCost] Find identical loads in the callee

Chandler Carruth via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 19 18:01:53 PDT 2017


chandlerc requested changes to this revision.
chandlerc added a comment.
This revision now requires changes to proceed.

Sorry for the awful delay when I got pulled into stuff, should be able to stay with the (small) stuff left in the review.



================
Comment at: lib/Analysis/InlineCost.cpp:1469-1470
       SingleBB = false;
+      if (EnableCSELoad)
+        disableCSELoad();
     }
----------------
haicheng wrote:
> chandlerc wrote:
> > Why?
> > 
> > One thing the inline cost analysis really tries to do is "straighten" CFGs that will end up folded when inlined, so it would be really good to allow simple CFGs to survive this much like my example above...
> > 
> > Especially with the initial policy of completely disabling the savings if we see a store *anywhere*.
> I observed some regressions when lifting this check.  When two repeated loads are far away from each other, LLVM may not be able to reduce the number of loads.  For example, LoadPre can add one load when it removes one.
> 
> I think it is more safe to support arbitrary CFG with the help of DominatorTree.  Only the repeated loads dominated by other identical loads can be simplified.  I think we need DominatorTree anyway if we try to address FIXME #1 you listed above because we need to iterate blocks in  the reverse post order.  So, I am going to add the support of multi-block function as another item in FIXME.
> 
> Your example above is still supported as long as SROA requirement is gone because InlineCost can figure out that the callee has single block after inlining.
Yeah, this makes sense. Especially the fact that the multiple block check handles the trivial cases for us makes it easy. THanks for the explanation!


================
Comment at: lib/Analysis/InlineCost.cpp:1008
 
+  // If the data are already loaded from this address and there is no reachable
+  // stores/calls or all reachable stores can be SROA'd in the function, this
----------------
'data are' -> 'data is'.


================
Comment at: lib/Analysis/InlineCost.cpp:1009
+  // If the data are already loaded from this address and there is no reachable
+  // stores/calls or all reachable stores can be SROA'd in the function, this
+  // load is likely to be redundant and can be eliminated.
----------------
I would say something more like:

'If the data is already leaded from this address and hasn't be clobbered by any stores or calls, this load ...'


================
Comment at: lib/Analysis/InlineCost.cpp:1011-1012
+  // load is likely to be redundant and can be eliminated.
+  if (EnableLoadElimination) {
+    if (!LoadAddrSet.insert(I.getPointerOperand()).second) {
+      LoadEliminationCost += InlineConstants::InstrCost;
----------------
No need for two `if`s, short circuit is sufficient.


================
Comment at: lib/Analysis/InlineCost.cpp:1033
 
+  // The store can potentially clobber the repeated loads and prevent them from
+  // elimination.
----------------
'clobber the repeated loads and prevent them from elimination' -> 'clobber loads and prevent repeated loads from being eliminated'


================
Comment at: lib/Analysis/InlineCost.cpp:1106-1107
+  Function *Fun = CS.getCalledFunction();
+  if (!Fun || !canConstantFoldCallTo(CS, Fun))
+    disableLoadElimination();
   if (CS.hasFnAttr(Attribute::ReturnsTwice) &&
----------------
I don't think we should use `canConstantFoldCallTo` here, because `simplifyCallSite` will already do that below along with a bunch of other things.

I think you really want to sink this down and handle it in a more detailed way. As a specific example: you should handle debug info intrinsics correctly (ignore them) and you should probably make some effort to handle lifetime markers because it seems like those can be handled trivially.


================
Comment at: test/Transforms/Inline/redundant-loads.ll:8
+; CHECK      Analyzing call of inner1... (caller:outer1)
+; CHECK      LoadEliminationCost: 5
+define void @outer1(i32* %a) {
----------------
I really dislike basing tests on debug prints. It makes them quite brittle. Typically for the inliner we craft a function that is too expensive to inline w/o the specific threshold applying, and then check that it does or doesn't get inlined in each case.


Repository:
  rL LLVM

https://reviews.llvm.org/D33946





More information about the llvm-commits mailing list