[LLVMdev] Introducing a branch optimization and prediction pass
Evan Cheng
evan.cheng at apple.com
Sun Mar 30 20:28:09 PDT 2008
On Mar 29, 2008, at 6:02 AM, Török Edwin wrote:
> Hi,
>
> I would like to transform unpredictable conditional branches with a
> small body into cmov instructions to reduce branch miss penalty.
> LLVM generates cmov/setcc instructions when SelectInst is used. The
> idea
> is to transform regular branches into selects, when that is possible
> and
> profitable.
LLVM is already aggressively turning branches into selects as far as I
can see.
>
> However this needs branch prediction info, and it seems LLVM doesn't
> provide that AFAICT.
> Could gcc's branch predictor be used here, or are there plans for an
> llvm-specific branch predictor?
Sure, either the FE or some earlier profiling pass should provide some
branch predication info.
>
>
> This has performance benefits if the branch is otherwise
> unpredictable,
> for example:
>
> for(i=0;i<len;i++) {
> if(isalnum(buf[i]))
> *p++ = buf[i];
> }
>
> Assume the input data is binary data, you can't know in advance if the
> branch is taken or not. Either prediction will lead to a miss
> penalty X%
> of the time.
> The conditional write can be transformed using a dummy variable [2].
Ok. You want to speculatively execute the code and only writes back
the result if the predicate is true, right?
>
>
> But, if the branch's body is big, or if is statically/dynamically
> predictable this optimization must not be applied.
> This optimization needs the following:
> * determine if the transformation can be applied:
> * if the body has call/return/asm it cannot be applied
>
> * if volatile data is accesed within the loop
> * any other situations?
If there are any instructions with side effects then this cannot be
done.
>
> * determine if branch is statically/dynamically predictable:
> * statically: if compiler can predict branch, don't apply this
> optimization
> * dynamically: can branch be predicted based on execution history
> (i.e. can CPU predict it?)
Based on profiling data from previous runs? Or dynamic profiling in
JIT situation?
>
> * how to determine if a branch's body is small?
> if execution_cost_of(body) +
> execution_cost_of(cmov)/branch_miss_probability < branch_miss_penalty,
> or simply: execution_cost_of(body) < branch_miss_penalty
> If it is bigger, then it is faster if we take the miss penalty (and
> don't execute the body)
> But miss-penalties, and execution_costs are target specific, so use
> TargetData here (and teach it about penalties), or
> use some heuristic based on number of instruction in basic block?
Sounds reasonable. But it's potentially more complicated than this for
a target like x86 that have very few registers. The speculatively
executed code will clobber registers so it has significant cost if the
result of computation is not written back.
>
> * if we cannot transform all instructions from the basic block into
> instr+cmov, then it is not worth applying this optimization (since
> we'll
> have a branch anyway).
Sure. It might be worthwhile to extend the if-converter to do this.
The first step is to turn it on for x86 to convert blocks which are
entirely move's.
>
> Does Codegen guarantee to always transform Select into CMOV/SETCC (or
> equivalent), on targets that support it?
Right.
>
> * if target doesn't support conditional moves, we must not apply this
> transformation
Correct.
>
> * memory ordering/multi-thread safety:
> if(pthread_mutex_trylock(...)) { *p++ = 4;}
> Transforming this into a cmov should be safe, because we are using a
> dummy variable on the stack as destination [2] if condition is false.
> What if we have memory reads, those might not be always be safe to
> move
> out of the branch's body (because the condition could be required to
> ensure the access is valid),
> or it could actually decrease performance in case the guarding
> condition
> is to prevent L2 cache misses, example:
> if(fast_filter_table[value] == match && slow_filter_table[value] ==
> match) { ... do something ... }
That's something to worry about later. :-)
>
>
> If we unconditionally read from slow_filter_table it could actually
> reduce the performance (assume slow_filter_table is huge), and might
> not
> be legal because we violate
> the short-circuit evaluation.
Right, speculative loads would require much more analysis to ensure
safety. This might be something you don't want to do in step 1.
>
>
> (BTW, the branch predictor should say it is predictable, and we
> wouldn't
> apply the transformation at all, but we cannot rely on the branch
> predictor to make
> short-circuit eval guarantees).
>
> Thoughts?
Adding branch prediction info is obviously useful even if speculation
is not performed. There are always CFG optimization passes (any
perhaps isel passes) that can make use of them. I am not sure whether
speculation can actually have a positive impact on x86. It's something
that requires some experimentation if you are interested.
Evan
>
>
> P.S.: is somebody working on something similar, or are there plans to
> implement something similar?
>
> [1] "Intel® 64 and IA-32 Architectures Optimization Reference Manual",
> section 3.4.1.1 Eliminating Branches:
>
> "Assembly/Compiler Coding Rule 2. (M impact, ML generality) Use the
> SETCC
> and CMOV instructions to eliminate unpredictable conditional
> branches where
> possible. Do not do this for predictable branches. Do not use these
> instructions to
> eliminate all unpredictable conditional branches (because using these
> instructions
> will incur execution overhead due to the requirement for executing
> both
> paths of a
> conditional branch)."
>
> [2] "Software Optimization Guide for AMD64 Processors", section 6.3
> "Branches That Depend on Random Data": "Avoid conditional branches
> that
> depend on random data, as these branches are difficult to predict.",
> Examples/Conditional Write
>
> Best regards,
> --Edwin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list