[LLVMdev] Introducing a branch optimization and prediction pass

Török Edwin edwintorok at gmail.com
Sat Mar 29 06:02:35 PDT 2008


Hi,

I would like to transform unpredictable conditional branches with a
small body into cmov instructions to reduce branch miss penalty.
LLVM generates cmov/setcc instructions when SelectInst is used. The idea
is to transform regular branches into selects, when that is possible and
profitable.
However this needs branch prediction info, and it seems LLVM doesn't
provide that AFAICT.
Could gcc's branch predictor be  used here, or are there plans for an
llvm-specific branch predictor?

This has performance benefits if the branch is otherwise unpredictable,
for example:

for(i=0;i<len;i++) {
  if(isalnum(buf[i]))
    *p++ = buf[i];
}

Assume the input data is binary data, you can't know in advance if the
branch is taken or not. Either prediction will lead to a miss penalty X%
of the time.
The conditional write can be transformed using a dummy variable [2].

But, if the branch's body is big, or if is statically/dynamically
predictable this optimization must not be applied.
This optimization needs the following:
* determine if the transformation can be applied:
    * if the body has call/return/asm it cannot be applied
    * if volatile data is accesed within the loop
    * any other situations?
* determine if branch is statically/dynamically predictable:
    * statically: if compiler can predict branch, don't apply this
optimization
    * dynamically: can branch be predicted based on execution history
(i.e. can CPU predict it?)
* how to determine if a branch's body is small?
    if execution_cost_of(body) +
execution_cost_of(cmov)/branch_miss_probability < branch_miss_penalty,
    or simply: execution_cost_of(body) < branch_miss_penalty
    If it is bigger, then it is faster if we take the miss penalty (and
don't execute the body)
    But miss-penalties, and execution_costs are target specific, so use
TargetData here (and teach it about penalties), or
    use some heuristic based on number of instruction in basic block?
* if we cannot transform all instructions from the basic block into
instr+cmov, then it is not worth applying this optimization (since we'll
have a branch anyway).
 Does Codegen guarantee to always transform Select into CMOV/SETCC (or
equivalent), on targets that support it?
* if target doesn't support conditional moves, we must not apply this
transformation
* memory ordering/multi-thread safety:
if(pthread_mutex_trylock(...)) { *p++ = 4;}
Transforming this into a cmov should be safe, because we are using a
dummy variable on the stack as destination [2] if condition is false.
What if we have memory reads, those might not be always be safe to move
out of the branch's body (because the condition could be required to
ensure the access is valid),
or it could actually decrease performance in case the guarding condition
is to prevent L2 cache misses, example:
if(fast_filter_table[value] == match && slow_filter_table[value] ==
match) { ... do something ... }

If we unconditionally read from slow_filter_table it could actually
reduce the performance (assume slow_filter_table is huge), and might not
be legal because we violate
the short-circuit evaluation.

(BTW, the branch predictor should say it is predictable, and we wouldn't
apply the transformation at all, but we cannot rely on the branch
predictor to make
short-circuit eval guarantees).

Thoughts?  

P.S.: is somebody working on something similar, or are there plans to
implement something similar?

[1] "Intel® 64 and IA-32 Architectures Optimization Reference Manual",
section 3.4.1.1 Eliminating Branches:

"Assembly/Compiler Coding Rule 2. (M impact, ML generality) Use the SETCC
and CMOV instructions to eliminate unpredictable conditional branches where
possible. Do not do this for predictable branches. Do not use these
instructions to
eliminate all unpredictable conditional branches (because using these
instructions
will incur execution overhead due to the requirement for executing both
paths of a
conditional branch)."

[2] "Software Optimization Guide for AMD64 Processors", section 6.3
"Branches That Depend on Random Data": "Avoid conditional branches that
depend on random data, as these branches are difficult to predict.",
Examples/Conditional Write

Best regards,
--Edwin



More information about the llvm-dev mailing list