[LLVMdev] [LSR] hoisting loop invariants in reverse order

Daniel Berlin dberlin at dberlin.org
Mon May 18 10:10:56 PDT 2015


On Fri, May 8, 2015 at 10:22 PM, Jingyue Wu <jingyue at google.com> wrote:
> Hi,
>
> I was tracking down a performance regression and noticed that
> LoopStrengthReduce hoists loop invariants (e.g., the initial formulae of
> indvars) in the reverse order of how they appear in the loop.
It has to to get maximized hoisting.

>
> This reverse order creates troubles for the StraightLineStrengthReduce pass
> I recently add. While I understand ultimately SLSR should be able to sort
> independent candidates in an optimal order, does it make sense to
> canonicalizing the order of LSR-hoisted loop invariants back to the
> "natural" order? IMO, the optimized code should in general resemble the
> original code unless intentionally changed otherwise.

This is likely a mistake, unless i'm missing something.
I suspect the insertion point is set to the default after, so that it
ends up reversing the order, instead of before, so that it ends up in
the original order.

>
> More specifically, I ran LSR on
>
> void foo(float *input, int a, int b, int c, int n) {
>   for (int node_x = 0; node_x < n; ++node_x) {
>     int pixel_idx = (a + node_x) * b; // {a*b, +, b}
>     bar(pixel_idx * c); // {a*b*c, +, b*c}
>     bar((pixel_idx + 1) * c); // {(a*b+1)*c, +, b*c}
>     bar((pixel_idx + 2) * c); // {(a*b+2)*c, +, b*c}
>     bar((pixel_idx + 3) * c); // {(a*b+3)*c, +, b*c}
>   }
> }
>
> and LSR produced
>
> void foo(float *input, int a, int b, int c, int n) {
>   int arg3 = (a * b + 3) * c;
>   int arg2 = (a * b + 2) * c;
>   int arg1 = (a * b + 1) * c;
>   int arg0 = a * b * c;
>   for (int node_x = 0; node_x < n; ++node_x) {
>     bar(arg0);
>     bar(arg1);
>     bar(arg2);
>     bar(arg3);
>     arg0 += b * c;
>     arg1 += b * c;
>     arg2 += b * c;
>     arg3 += b * c;
>   }
> }
> (with obvious redundant operations, i.e. a * b and b * c, combined). Note
> that the order of arg0~3 is reversed.
>
> Reversing the order of arg0~3 is not intentional. The user list of pixel_idx
> happens to have pixel_idx+3, pixel_idx+2, and pixel_idx+1 in this order, so
> LSR simply follows this order when collecting the LSRFixups.
>
> This creates troubles for SLSR. Given the current order of arg0~arg3
>
>   int arg3 = (a * b + 3) * c;
>   int arg2 = (a * b + 2) * c;
>   int arg1 = (a * b + 1) * c;
>   int arg0 = a * b * c;
>
> SLSR optimizes them to
>
>   int arg3 = (a * b + 3) * c;
>   int arg2 = arg3 - c;
>   int arg1 = arg2 - c;
>   int arg0 = arg1 - c;
>   // 2 muls and 4 adds/subs
>
> In contrast, if arg0~3 were in the order of
>
>   int arg0 = a * b * c;
>   int arg1 = (a * b + 1) * c;
>   int arg2 = (a * b + 2) * c;
>   int arg3 = (a * b + 3) * c;
>
> SLSR would optimize them to
>
>   int arg0 = a * b * c;
>   int arg1 = arg0 + c;
>   int arg2 = arg1 + c;
>   int arg3 = arg2 + c;
>   // 2 muls and 3 adds/subs. 1 add/sub less than with the reversed order
>
> I have a proof-of-concept patch
> (http://reviews.llvm.org/differential/diff/25402/) that has
> CollectFixupsAndInitialFormulae to sort initial formulae in a dominance
> order (i.e. if A.getUser() dominates B.getUser(), then we put A before B).

Is there a reason to not just put it back in the original order?
IE is dominance order better?



More information about the llvm-dev mailing list