[PATCH] D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition.

Tue Jul 12 13:56:42 PDT 2016

bmakam added a comment.

In http://reviews.llvm.org/D21299#481076, @t.p.northover wrote:

> Yuck, I hate heuristics like this. It's not even particularly clear that "range size" correlates well with probability in real code, let alone with what any given branch predictor thinks of that probability.

With all due respect, I think there are some facts that need some clarification. First, for targets which have cheap jump instructions, currently LLVM splits a conditional branch like:

  %0 = icmp ne i32 %a, 0
  %1 = icmp eq i32 %b, 2
  %or.cond = and i1 %0, %1
  br i1 %or.cond, label %TrueBB, label %FalseBB

into multiple branch instructions like:

  BB1:
  %0 = icmp ne i32 %a, 0
  br i1 %0, label %SplitBB, label %FalseBB
  SplitBB:
  %1 = icmp eq i32 %b, 2
  br i1 %1, label %TrueBB, label %FalseBB

This requires creation of SplitBB after BB1 and currently we update branch weights taking liberty under the assumption that

  FalseProb for BB1 == TrueProb for BB1 * FalseProb for SplitBB.

This is very fragile because it assumes that the source order correlates well with the probability in real code which is not true as seen in mcf case here. Furthermore, the codegen doesn't end up always with source order because earlier transformations such as Reassociation and JumpThreading sometimes commutate the binary operands resulting in big swings in performance for reasons unrelated to the original change as seen in here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20141117/244920.html
What this patch tries to achieve is to address this issue by ranking the commutative binary operands based on their range sizes, so that we have a consistent codegen that doesn't fluctuate the performance with any minor changes in earlier optimization passes and at the same time gives better performance overall. 
Second, it is very hard to correlate something with probability in real code. Even when using PGO for SPEC, we generate profile using train input and use it for ref input although for most benchmarks train input does not correlate to ref input. FWIW, this heuristic is not assigning any branch probabilities based on the range size, it only ranks the commutative binary operands based on the generic assumption that if we have two conditions a != 0 and b == 2, it is more likely that a != 0 than b == 2, because for all possible values 'a' and 'b' can take it is more likely that it is not equal to some value than it is equal to some other value. Conservatively, I scaled the comparision to logarithmic so that small differences in range sizes can be ignored. One situation this assumption might not hold is when 'a' or 'b' are enums or macros, although I haven't checked if this is the cause for some of the regressions we see.

> When you remove the 'mcf' test that this patch was specifically written to target, it turns into a net regression on geomean even on Kryo. I'm not entirely sure how fair that is (I avoided statistics like the plague), but isn't using separate data to derive and test code/hypotheses generally considered a good thing?

Earnestly, the regressions were minor (except for perlbench which was -2.5%) compared to the gains that I did not spend time looking for the reason. I am sorry that you found this heuristic distasteful, one alternative I can think of to fix this issue is to canonicalize icmp ne to LHS of the AND expression or to RHS of the OR expression which might be less yuck but still strange.

http://reviews.llvm.org/D21299