[LLVMdev] [LSR] hoisting loop invariants in reverse order

Mon May 18 10:13:37 PDT 2015

> On May 8, 2015, at 10:22 PM, Jingyue Wu <jingyue at google.com> wrote:
> 
> Hi, 
> 
> I was tracking down a performance regression and noticed that LoopStrengthReduce hoists loop invariants (e.g., the initial formulae of indvars) in the reverse order of how they appear in the loop. 
> 
> This reverse order creates troubles for the StraightLineStrengthReduce pass I recently add. While I understand ultimately SLSR should be able to sort independent candidates in an optimal order, does it make sense to canonicalizing the order of LSR-hoisted loop invariants back to the "natural" order? IMO, the optimized code should in general resemble the original code unless intentionally changed otherwise. 

I dislike passes that shuffle code for no reason, so regardless of SLSR I agree that LSR should preserve the order of IV users if it’s not too hard. I’m guessing this is just LSR sloppiness.

Andy

> 
> More specifically, I ran LSR on
> 
> void foo(float *input, int a, int b, int c, int n) {
>   for (int node_x = 0; node_x < n; ++node_x) {
>     int pixel_idx = (a + node_x) * b; // {a*b, +, b}
>     bar(pixel_idx * c); // {a*b*c, +, b*c}
>     bar((pixel_idx + 1) * c); // {(a*b+1)*c, +, b*c}
>     bar((pixel_idx + 2) * c); // {(a*b+2)*c, +, b*c}
>     bar((pixel_idx + 3) * c); // {(a*b+3)*c, +, b*c}
>   }
> }
> 
> and LSR produced
> 
> void foo(float *input, int a, int b, int c, int n) {
>   int arg3 = (a * b + 3) * c;
>   int arg2 = (a * b + 2) * c;
>   int arg1 = (a * b + 1) * c;
>   int arg0 = a * b * c;
>   for (int node_x = 0; node_x < n; ++node_x) {
>     bar(arg0);
>     bar(arg1);
>     bar(arg2);
>     bar(arg3);
>     arg0 += b * c;
>     arg1 += b * c;
>     arg2 += b * c;
>     arg3 += b * c;
>   }
> }
> (with obvious redundant operations, i.e. a * b and b * c, combined). Note that the order of arg0~3 is reversed. 
> 
> Reversing the order of arg0~3 is not intentional. The user list of pixel_idx happens to have pixel_idx+3, pixel_idx+2, and pixel_idx+1 in this order, so LSR simply follows this order when collecting the LSRFixups. 
> 
> This creates troubles for SLSR. Given the current order of arg0~arg3
> 
>   int arg3 = (a * b + 3) * c;
>   int arg2 = (a * b + 2) * c;
>   int arg1 = (a * b + 1) * c;
>   int arg0 = a * b * c;
> 
> SLSR optimizes them to
> 
>   int arg3 = (a * b + 3) * c;
>   int arg2 = arg3 - c;
>   int arg1 = arg2 - c;
>   int arg0 = arg1 - c;
>   // 2 muls and 4 adds/subs
> 
> In contrast, if arg0~3 were in the order of 
> 
>   int arg0 = a * b * c;
>   int arg1 = (a * b + 1) * c;
>   int arg2 = (a * b + 2) * c;
>   int arg3 = (a * b + 3) * c;
> 
> SLSR would optimize them to
> 
>   int arg0 = a * b * c;
>   int arg1 = arg0 + c;
>   int arg2 = arg1 + c;
>   int arg3 = arg2 + c;
>   // 2 muls and 3 adds/subs. 1 add/sub less than with the reversed order
> 
> I have a proof-of-concept patch (http://reviews.llvm.org/differential/diff/25402/ <http://reviews.llvm.org/differential/diff/25402/>) that has CollectFixupsAndInitialFormulae to sort initial formulae in a dominance order (i.e. if A.getUser() dominates B.getUser(), then we put A before B). It breaks some tests that are too sensitive to order changes; besides that, I don't see significant issues. 
> 
> Jingyue

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150518/17f2a6d1/attachment.html>