[LLVMdev] [LSR] hoisting loop invariants in reverse order
Andrew Trick
atrick at apple.com
Mon May 18 10:13:37 PDT 2015
> On May 8, 2015, at 10:22 PM, Jingyue Wu <jingyue at google.com> wrote:
>
> Hi,
>
> I was tracking down a performance regression and noticed that LoopStrengthReduce hoists loop invariants (e.g., the initial formulae of indvars) in the reverse order of how they appear in the loop.
>
> This reverse order creates troubles for the StraightLineStrengthReduce pass I recently add. While I understand ultimately SLSR should be able to sort independent candidates in an optimal order, does it make sense to canonicalizing the order of LSR-hoisted loop invariants back to the "natural" order? IMO, the optimized code should in general resemble the original code unless intentionally changed otherwise.
I dislike passes that shuffle code for no reason, so regardless of SLSR I agree that LSR should preserve the order of IV users if it’s not too hard. I’m guessing this is just LSR sloppiness.
Andy
>
> More specifically, I ran LSR on
>
> void foo(float *input, int a, int b, int c, int n) {
> for (int node_x = 0; node_x < n; ++node_x) {
> int pixel_idx = (a + node_x) * b; // {a*b, +, b}
> bar(pixel_idx * c); // {a*b*c, +, b*c}
> bar((pixel_idx + 1) * c); // {(a*b+1)*c, +, b*c}
> bar((pixel_idx + 2) * c); // {(a*b+2)*c, +, b*c}
> bar((pixel_idx + 3) * c); // {(a*b+3)*c, +, b*c}
> }
> }
>
> and LSR produced
>
> void foo(float *input, int a, int b, int c, int n) {
> int arg3 = (a * b + 3) * c;
> int arg2 = (a * b + 2) * c;
> int arg1 = (a * b + 1) * c;
> int arg0 = a * b * c;
> for (int node_x = 0; node_x < n; ++node_x) {
> bar(arg0);
> bar(arg1);
> bar(arg2);
> bar(arg3);
> arg0 += b * c;
> arg1 += b * c;
> arg2 += b * c;
> arg3 += b * c;
> }
> }
> (with obvious redundant operations, i.e. a * b and b * c, combined). Note that the order of arg0~3 is reversed.
>
> Reversing the order of arg0~3 is not intentional. The user list of pixel_idx happens to have pixel_idx+3, pixel_idx+2, and pixel_idx+1 in this order, so LSR simply follows this order when collecting the LSRFixups.
>
> This creates troubles for SLSR. Given the current order of arg0~arg3
>
> int arg3 = (a * b + 3) * c;
> int arg2 = (a * b + 2) * c;
> int arg1 = (a * b + 1) * c;
> int arg0 = a * b * c;
>
> SLSR optimizes them to
>
> int arg3 = (a * b + 3) * c;
> int arg2 = arg3 - c;
> int arg1 = arg2 - c;
> int arg0 = arg1 - c;
> // 2 muls and 4 adds/subs
>
> In contrast, if arg0~3 were in the order of
>
> int arg0 = a * b * c;
> int arg1 = (a * b + 1) * c;
> int arg2 = (a * b + 2) * c;
> int arg3 = (a * b + 3) * c;
>
> SLSR would optimize them to
>
> int arg0 = a * b * c;
> int arg1 = arg0 + c;
> int arg2 = arg1 + c;
> int arg3 = arg2 + c;
> // 2 muls and 3 adds/subs. 1 add/sub less than with the reversed order
>
> I have a proof-of-concept patch (http://reviews.llvm.org/differential/diff/25402/ <http://reviews.llvm.org/differential/diff/25402/>) that has CollectFixupsAndInitialFormulae to sort initial formulae in a dominance order (i.e. if A.getUser() dominates B.getUser(), then we put A before B). It breaks some tests that are too sensitive to order changes; besides that, I don't see significant issues.
>
> Jingyue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150518/17f2a6d1/attachment.html>
More information about the llvm-dev
mailing list