[PATCH] D33946: [InlineCost] Find identical loads in single block function

Tue Jun 13 12:50:49 PDT 2017

chandlerc requested changes to this revision.
chandlerc added a comment.
This revision now requires changes to proceed.

Really cool!

After reading all of this and writing the comments below, I'd switch the term 'CSE' everywhere for 'Elimination'. I know its longer, but I think it is easier for people to read and not get confused about "but what is the subexpression?". This is all really just load elimination.

================
Comment at: lib/Analysis/InlineCost.cpp:145-148
+  /// After SROA finds Stores that can be simplified, the identical loads
+  /// clobbered by these Stores can also be removed.  The simple CSE analysis
+  /// only supports single block callee and quits if there are Stores cannot be
+  /// simplified or function calls in the callee.
----------------
I wouldn't describe this (or limit it) to anything to do with SROA.

It should just model the CSE-ing of repeated loads that is expected to happen whenever we simplify away the stores that would otherwise cause them to be loads.

I wouldn't disable it when we don't have SROA-candidates as I can see it being valuable in plenty of other cases. As a trivial example, if a predicate causes all of the stores to memory to be dead, we won't traverse those blocks, and we should track the CSE simplification to any redundant loads. It'd be good to have a test case for this as well. As a concrete example of where I think this might well come up:

  int foo(bool predicate) {
    if (some_global_variable && predicate) {
      function_that_writes_to_global();
      // lots more code
    }
    return some_global_variable;
  }

================
Comment at: lib/Analysis/InlineCost.cpp:161
                           int InstructionCost);
+  void disableCSELoad();
   bool isGEPFree(GetElementPtrInst &GEP);
----------------
I would call this `disableLoadCSE` as that reads better I think.

================
Comment at: lib/Analysis/InlineCost.cpp:265
   unsigned SROACostSavingsLost;
+  unsigned CSELoadCostSavings;

----------------
Similarly, I would call this `LoadCSECostSavings`.

================
Comment at: lib/Analysis/InlineCost.cpp:842

+  if(EnableCSELoad) {
+    if (!LoadAddrSet.insert(I.getPointerOperand()).second) {
----------------
clang-format please.

================
Comment at: lib/Analysis/InlineCost.cpp:843
+  if(EnableCSELoad) {
+    if (!LoadAddrSet.insert(I.getPointerOperand()).second) {
+      CSELoadCost += InlineConstants::InstrCost;
----------------
Some comment about what is going on here would be good as this isn't a function call with docs.

================
Comment at: lib/Analysis/InlineCost.cpp:864-865
   }
-
+  if (EnableCSELoad)
+    disableCSELoad();
   return false;
----------------
I would add a comment about this potentially clobbering the load.

I would also add a FIXME about making this more powerful. I see at least two possible ways:

1) We can probably keep an initial set of CSE-able loads substracted from the cost even when we finally see a store. We just need to disable *further* accumulation of CSE savings. This isn't trivial if you follow my suggestion below (which I think is more important) to support multiple basic blocks. In that case, you'll need to use a bit more care to only accumulate the safe CSE savings. But it still seems worth doing even with that complexity in a follow-up patch so I'd document it with a FIXME here.

2) We should probably at some point thread MemorySSA for the callee into this and then use that to actually compute *really* precise savings. But that's even more complex than #1 so definitely for a future patch. =]

================
Comment at: lib/Analysis/InlineCost.cpp:1469-1470
       SingleBB = false;
+      if (EnableCSELoad)
+        disableCSELoad();
     }
----------------
Why?

One thing the inline cost analysis really tries to do is "straighten" CFGs that will end up folded when inlined, so it would be really good to allow simple CFGs to survive this much like my example above...

Especially with the initial policy of completely disabling the savings if we see a store *anywhere*.

================
Comment at: test/Transforms/Inline/redundant-loads.ll:16-24
+; Every data in matrix a and b are loaded three times.
+; typedef struct { double m[3][3]; } matrix;
+; void matrix_multiply( matrix *a, matrix *b, matrix *c ){
+;   int i,j,k;
+;     for(i=0; i<3; i++)
+;       for(j=0; j<3; j++)
+;          for(k=0; k<3; k++)
----------------
While motivating C code is nice, please use a much simpler test case.

Notably, you can set the thresholds artificially low and use a very small, focused test for it.

Also, you should include various *negative* tests for the ways in which this costs savings can be disabled. These tests *should* cause inlining if you accumulate the load elimination cost savings, but *shouldn't* because of a store, call, or other thing that blocks it.

Repository:
  rL LLVM

https://reviews.llvm.org/D33946