[PATCH] [SimplifyCFG] threshold for folding branches with common destination
Owen Anderson
resistor at mac.com
Mon Sep 29 11:11:10 PDT 2014
Could this be made a pass parameter rather than (or in addition to) a command line option?
—Owen
> On Sep 29, 2014, at 10:51 AM, Jingyue Wu <jingyue at google.com> wrote:
>
> Hi nadav, resistor, eliben, meheff,
>
> This patch adds a threshold that controls the number of bonus instructions
> allowed for folding branches with common destination. The original code allows
> at most one bonus instruction. With this patch, users can customize the
> threshold to allow multiple bonus instructions. The default threshold is still
> 1, so that the code behaves the same as before when users do not specify this
> threshold.
>
> The motivation of this change is that tuning this threshold significantly (up
> to 25%) improves the performance of some CUDA programs in our internal code
> base. In general, branch instructions are very expensive for GPU programs.
> Therefore, it is sometimes worth trading more arithmetic computation for a more
> straightened control flow. Here's a reduced example:
>
> __global__ void foo(int a, int b, int c, int d, int e, int n,
> const int *input, int *output) {
> int sum = 0;
> for (int i = 0; i < n; ++i)
> sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
> *output = sum;
> }
>
> The select statement in the loop body translates to two branch instructions "if
> ((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
> With the default threshold, SimplifyCFG is unable to fold them, because
> computing the condition of the second branch "(i | c) ^ d > e" requires two
> bonus instructions. With the threshold increased, SimplifyCFG can fold the two
> branches so that the loop body contains only one branch, making the code
> conceptually look like:
>
> sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];
>
> Increasing the threshold significantly improves the performance of this
> particular example. In the configuration where both conditions are guaranteed
> to be true, increasing the threshold from 1 to 2 improves the performance by
> 18.24%. Even in the configuration where the first condition is false and the
> second condition is true, which favors shortcuts, increasing the threshold from
> 1 to 2 still improves the performance by 4.35%.
>
> We are still looking for a good threshold and maybe a better cost model than
> just counting the number of bonus instructions. However, according to the above
> numbers, we think it is at least worth adding a threshold to enable more
> experiments and tuning. Let me know what you think. Thanks!
>
> http://reviews.llvm.org/D5529
>
> Files:
> lib/Transforms/Utils/SimplifyCFG.cpp
> test/Transforms/SimplifyCFG/branch-fold-threshold.ll
> <D5529.14181.patch>
More information about the llvm-commits
mailing list