[PATCH] D18827: Rework/enhance stack coloring data flow analysis.

Mon Apr 11 08:15:44 PDT 2016

thanm added a comment.

Hi Wei,

Thanks for the review. I responded to your comments in line.

You are quite right, even after this patch, clang/llvm will still worse than GCC.  The reason has to do with the way that the GCC code computes interferences between variables.

The GCC implementation uses an interference graph (node == var, edge == coloring conflict between vars), and it adds edges in the interference graph only at DEFs: statements where there could conceivably be some sort of memory write. In contrast, the LLVM algorithm works by building live intervals and checking to see whether there is any overlap between the intervals (there is no attention paid to DEF vs non-DEF). This second strategy is simpler but is definitely less precise in some cases.

Let me expand on this a bit. Going back to the if/then example:

  int foo(int x) {
    int ar1[128]; int ar2[128];
    if (x & 0x99) {
      bar(&ar1); return x-1;
    } else {
      bar(&ar2); return x+1;
    }
  }

Here is what the IR looks like at the point where the analysis is done (it's pretety much the same between both GCC and LLVM at an abstract level):

         BB#1:
  I0:    call bar(&ar1)
  I1:    %vreg1 = %vregx - 1
  I2:    jmp BB3
  
         BB#2:
  I3:    call bar(&ar1)
  I4:    %vreg2 = %vregx + 1
         <fall through>
     
         BB#3:
  I5:    %vrefr = phi(%vreg1, %vreg2)
  I6:    LIFETIME_END <ar1>
  I7:    LIFETIME_END <ar2>
  I8:    <return %vregr>

As mentioned previously, the GCC algorithm uses an uses an interference graph -- for each variable X it keeps a bit vector of other variables that X interferes with. At each BB, it computes the live-in set for the BB by unioning together the live out sets of the pred BBs. It then walks the BB and looks for any instruction that could write memory (essentially anything other than phi or debug). At such an inst, it adds conflicts between every variable in the work set.

At the point where GCC processes BB#3 above, both "ar1" and "ar2" are in the work set. However as it walks through the BB it never sees any instruction that could write memory, so it never adds conflicts between the vars.

The LLVM implementation doesn't use an interference graph; instead it builds live intervals for each variable and then sees whether there is overlap between the intervals; this approach is less precise. For the graph above, we wind up with the following live intervals:

  ar1: [1-2], [5-6]
  ar2: [3-7]

The existing stack-coloring code examines these two intervals and sees that they overlap in at inst 5, so it considers the two variables as interfering and gives them separate stack allocations.

[This situation is somewhat related to the so-called "X graph problem" in register allocation, you may have heard of it.]

The patch I wrote sticks with the current method of building the (less precise) interval graph, hence it doesn't catch as many legal variable overlaps. The main reason I took this route was that there is an option in the existing implementation ("protect-from-escaped-allocas") that uses the interval information to identify "stray" uses of a stack location outside of the LIFETIME-START/LIFETIME-END interval the variable -- moving to an interference graph would have made it difficult to retain this functionality. Second, since I am an LLVM newbie I thought it would be best to start with a relatively modest patch, as opposed to completely throwing out the old implementation and putting in my own.

Given that the new implementation is a bit better but not ideal, I think we can revisit it at some point and decide whether we want to throw out intervals and move to interference-graph.

Thanks, Than


================
Comment at: lib/CodeGen/StackColoring.cpp:435
@@ -361,2 +434,3 @@
 
+      // Update block LiveOut set, noting whether it has changed
       if (LocalLiveOut.test(BlockInfo.LiveOut)) {
----------------
wmi wrote:
> Since we only do dataflow iteration in forward direction now, if only no change in LiveIn, there shouldn't be change in LiveOut. So can we move BlockInfo.LiveOut |= LocalLiveOut into the above if?
I think there would be a problem with that on the first dataflow iteration for blocks containing a GEN (or first-use) of some variable "x". In that case we wind up adding "X" to the live out set, so we need to update the changed flag. On all subsequent iterations what you say is true.

================
Comment at: lib/CodeGen/StackColoring.cpp:809
@@ -716,2 +808,3 @@
   // Calculate the liveness of each block.
-  calculateLocalLiveness();
+  unsigned NumIters = calculateLocalLiveness();
+  DEBUG(dbgs()<< "Dataflow iterations: " << NumIters << "\n");
----------------
wmi wrote:
> There may be an unused warning when NDEBUG is enabled.
Hmm. Good point. I will rework the code to handle this.

================
Comment at: test/CodeGen/X86/StackColoring.ll:445
@@ +444,3 @@
+  store i32 %x, i32* %x.addr, align 4
+  %0 = bitcast [128 x i32]* %itar1 to i8*
+  call void @llvm.lifetime.start(i64 512, i8* %0) #3
----------------
wmi wrote:
> %0, %1, %2 are prone to changing. we can run "opt -instnamer" to rename %0, %1, %2..., so the test will be stabler.
Thanks-- I hadn't realized that we had such a tool. I'll update the test.


http://reviews.llvm.org/D18827