[LLVMdev] Introducing a branch optimization and prediction pass

Sun Mar 30 20:28:09 PDT 2008

On Mar 29, 2008, at 6:02 AM, Török Edwin wrote:

> Hi,
>
> I would like to transform unpredictable conditional branches with a
> small body into cmov instructions to reduce branch miss penalty.
> LLVM generates cmov/setcc instructions when SelectInst is used. The  
> idea
> is to transform regular branches into selects, when that is possible  
> and
> profitable.

LLVM is already aggressively turning branches into selects as far as I  
can see.

>
> However this needs branch prediction info, and it seems LLVM doesn't
> provide that AFAICT.
> Could gcc's branch predictor be  used here, or are there plans for an
> llvm-specific branch predictor?

Sure, either the FE or some earlier profiling pass should provide some  
branch predication info.

>
>
> This has performance benefits if the branch is otherwise  
> unpredictable,
> for example:
>
> for(i=0;i<len;i++) {
>  if(isalnum(buf[i]))
>    *p++ = buf[i];
> }
>
> Assume the input data is binary data, you can't know in advance if the
> branch is taken or not. Either prediction will lead to a miss  
> penalty X%
> of the time.
> The conditional write can be transformed using a dummy variable [2].

Ok. You want to speculatively execute the code and only writes back  
the result if the predicate is true, right?

>
>
> But, if the branch's body is big, or if is statically/dynamically
> predictable this optimization must not be applied.
> This optimization needs the following:
> * determine if the transformation can be applied:
>    * if the body has call/return/asm it cannot be applied
>
>    * if volatile data is accesed within the loop
>    * any other situations?

If there are any instructions with side effects then this cannot be  
done.

>
> * determine if branch is statically/dynamically predictable:
>    * statically: if compiler can predict branch, don't apply this
> optimization
>    * dynamically: can branch be predicted based on execution history
> (i.e. can CPU predict it?)

Based on profiling data from previous runs? Or dynamic profiling in  
JIT situation?

>
> * how to determine if a branch's body is small?
>    if execution_cost_of(body) +
> execution_cost_of(cmov)/branch_miss_probability < branch_miss_penalty,
>    or simply: execution_cost_of(body) < branch_miss_penalty
>    If it is bigger, then it is faster if we take the miss penalty (and
> don't execute the body)
>    But miss-penalties, and execution_costs are target specific, so use
> TargetData here (and teach it about penalties), or
>    use some heuristic based on number of instruction in basic block?

Sounds reasonable. But it's potentially more complicated than this for  
a target like x86 that have very few registers. The speculatively  
executed code will clobber registers so it has significant cost if the  
result of computation is not written back.

>
> * if we cannot transform all instructions from the basic block into
> instr+cmov, then it is not worth applying this optimization (since  
> we'll
> have a branch anyway).

Sure. It might be worthwhile to extend the if-converter to do this.  
The first step is to turn it on for x86 to convert blocks which are  
entirely move's.

>
> Does Codegen guarantee to always transform Select into CMOV/SETCC (or
> equivalent), on targets that support it?

Right.

>
> * if target doesn't support conditional moves, we must not apply this
> transformation

Correct.

>
> * memory ordering/multi-thread safety:
> if(pthread_mutex_trylock(...)) { *p++ = 4;}
> Transforming this into a cmov should be safe, because we are using a
> dummy variable on the stack as destination [2] if condition is false.
> What if we have memory reads, those might not be always be safe to  
> move
> out of the branch's body (because the condition could be required to
> ensure the access is valid),
> or it could actually decrease performance in case the guarding  
> condition
> is to prevent L2 cache misses, example:
> if(fast_filter_table[value] == match && slow_filter_table[value] ==
> match) { ... do something ... }

That's something to worry about later. :-)

>
>
> If we unconditionally read from slow_filter_table it could actually
> reduce the performance (assume slow_filter_table is huge), and might  
> not
> be legal because we violate
> the short-circuit evaluation.

Right, speculative loads would require much more analysis to ensure  
safety. This might be something you don't want to do in step 1.

>
>
> (BTW, the branch predictor should say it is predictable, and we  
> wouldn't
> apply the transformation at all, but we cannot rely on the branch
> predictor to make
> short-circuit eval guarantees).
>
> Thoughts?

Adding branch prediction info is obviously useful even if speculation  
is not performed. There are always CFG optimization passes (any  
perhaps isel passes) that can make use of them. I am not sure whether  
speculation can actually have a positive impact on x86. It's something  
that requires some experimentation if you are interested.

Evan

>
>
> P.S.: is somebody working on something similar, or are there plans to
> implement something similar?
>
> [1] "Intel® 64 and IA-32 Architectures Optimization Reference Manual",
> section 3.4.1.1 Eliminating Branches:
>
> "Assembly/Compiler Coding Rule 2. (M impact, ML generality) Use the  
> SETCC
> and CMOV instructions to eliminate unpredictable conditional  
> branches where
> possible. Do not do this for predictable branches. Do not use these
> instructions to
> eliminate all unpredictable conditional branches (because using these
> instructions
> will incur execution overhead due to the requirement for executing  
> both
> paths of a
> conditional branch)."
>
> [2] "Software Optimization Guide for AMD64 Processors", section 6.3
> "Branches That Depend on Random Data": "Avoid conditional branches  
> that
> depend on random data, as these branches are difficult to predict.",
> Examples/Conditional Write
>
> Best regards,
> --Edwin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev