[PATCH] Switch lowering: use profile info to build weight-balanced binary search trees

Wed Apr 29 18:00:24 PDT 2015

In http://reviews.llvm.org/D9318#163566, @djasper wrote:

> Just to clarify. I think this is definitely an excellent step and good to go in independent of whether my analysis is correct or not :-).
>
> So, ship it :-).

Thanks very much for the review!

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7997
@@ +7996,3 @@
+
+  // Move LastLeft and FirstRight towards eachother from opposite directions to
+  // find a partitioning of the clusters which balances the weight on both
----------------
djasper wrote:
> "each other"?
Done.

================
Comment at: test/CodeGen/Generic/MachineBranchProb.ll:61
@@ +60,3 @@
+; CHECK-NOT: Successors
+; CHECK: Successors according to CFG: BB#8(13) BB#9(20)
+}
----------------
djasper wrote:
> hans wrote:
> > djasper wrote:
> > > Maybe add a comment that this division is because the left half is 1+10+1+1 = 13, the right half is 10+10=20 and we are directly checking against the pivot element so that that one weight doesn't contribute to either side. (If I am correct).
> > The first weight in the metadata is for the default case. The weights on the left and right are: 10+1+1+1=13 vs 10+10=20.
> > 
> > I'll add comments making this easier to read.
> > 
> > The pivot element is part of the right side and does contribute to that weight. (This is different from gcc which I think always does both 'je' and 'jg' on pivot elements, re-using the condition code from the 'cmp'.)
> Right. Why doesn't LLVM do that? Intuitively, it sounds like a good idea. (But that's for a follow-up patch if that).
It's not a given win. It reduces the number of cmps, but introduces more branches. It probably varies by architecture how fast this "three-way branch" is.

For the benchmark attached in this review, Clang's -O3 (without profile info) was faster than gcc's -O3, maybe because of this.

================
Comment at: test/CodeGen/Generic/MachineBranchProb.ll:58
@@ +57,3 @@
+; Check that we set branch weights on the pivot cmp instruction correctly.
+; Cases {0,10,20,30} go on the left with weight 13; cases {40,50} go on the
+; right with weight 20.
----------------
djasper wrote:
> This is actually an interesting case. As I understand it, LLVM will now do?
> 
>   this_case = (x < 40
>                    ? x < 20 ? x == 0 ? A : x == 10 ? B : DEFAULT
>                             : x == 20 ? C : x == 30 ? D : DEFAULT
>                    : x == 40 ? E : x == 50 ? F : DEFAULT);
> 
> (I hope it is somewhat readable what I mean, case A represents 0, case B 10,... - I just didn't want to re-use the numbers as that would be even more confusing).
> 
> Now, I hope I counted correctly, this gives me the following numbers.
> 
>   No of comparisons:
>   A:    3 (Weight: 10, Weighted: 30)
>   B:    4 (Weight:  1, Weighted:  4)
>   C:    3 (Weight:  1, Weighted:  3)
>   D:    4 (Weight:  1, Weighted:  4)
>   E:    2 (Weight: 10, Weighted: 20)
>   F:    3 (Weight: 10, Weighted: 30)
>   ---
>   SUM: 19 (            Weighted: 91)
> 
> However, if we split this equally, we get to do a linear scan on both sides:
> 
>   this_case = (x < 30 ? x == 0 ? A : x == 10 ? B : x == 20 ? C : DEFAULT
>                       : x == 40 ? E : x == 50 ? F : x == 30 ? D : DEFAULT);
> 
>   No of comparisons:
>   A:    2 (Weight: 10, Weighted: 20)
>   B:    3 (Weight:  1, Weighted:  3)
>   C:    4 (Weight:  1, Weighted:  4)
>   D:    4 (Weight:  1, Weighted:  4)
>   E:    2 (Weight: 10, Weighted: 20)
>   F:    3 (Weight: 10, Weighted: 30)
>   ---
>   SUM: 18 (            Weighted: 81)
> 
> Which seems beneficial in total. I think the rule might be something like: Do never create a split with less than three elements on one side unless the smallest weight on that side is larger than all the weights on the other side. But that might not be sufficient.
It's going to be
```
this_case = (x < 40
                 ? x < 30 ? x == 0 ? A : default
                          : x == 10 ? C : x == 20 ? C : x == 30 ? D : DEFAULT
                 : x == 40 ? E : x == 50 ? F : DEFAULT);
```
But the numbers add upp the same way.

This is interesting. It's something about our leaves being different than in a typical binary tree in that we can do up to three comparisons in them.

Like you, I'm also thinking we should probably be careful creating a split with less than three on one side + some conditions. But this will require some thinking.

As discussed on the IRC channel, for switches where some cases are completely dominating, hoisting them out is probably a good idea. What I've been doing when playing with benchmarks is turning:
```
switch (x) {
  cases ...
}
```
into
```
switch (x) {
  hot_case 1:
  hot_case 2:
  ...
}
switch (x) {
  cold_cases here
}
```
And that's been really beneficial for switches where a few cases dominate.

I like the current approach for handling switches in general though, and I think the hoisting would be a separate thing that takes place before.

================
Comment at: test/CodeGen/Generic/MachineBranchProb.ll:70
@@ +69,3 @@
+  i32 1,
+  ; Case 10, 20, 30:
+  i32 10, i32 1, i32 1,
----------------
djasper wrote:
> I think this is Case 0, 10, 20
Oops. Fixed.

================
Comment at: test/CodeGen/X86/switch.ll:471
@@ +470,3 @@
+
+!3 = !{!"branch_weights", i32 1, i32 10, i32 0, i32 0, i32 0, i32 0, i32 10}
+
----------------
djasper wrote:
> hans wrote:
> > djasper wrote:
> > > Could you describe somewhere (patch description, comment) what weight=0 means and how it affects the balancing? From this test, I would assume that we just do balanced trees in this case. However, the test below suggests otherwise.
> > The problem is that the pivot finding code cannot handle 0-weight nodes as they don't affect the weight-balance. E.g. a tree with weights {10,0,0,0,0,0,0} to the left and {10} on the right would be considered balanced because the sub-trees have equal weight.
> > 
> > Luckily, BranchProbability::getEdgeWeight() will never return 0. I just figured this was important enough to have a test for.
> > 
> > I'll add a comment in the code.
> But what's the added benefit over the test below?
Oh, the test below shouldn't have 0-weights. I'll remove that.

http://reviews.llvm.org/D9318

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/