[LLVMdev] [LSR] hoisting loop invariants in reverse order
Jingyue Wu
jingyue at google.com
Mon May 18 10:04:39 PDT 2015
Thoughts?
On Fri, May 8, 2015 at 10:22 PM, Jingyue Wu <jingyue at google.com> wrote:
> Hi,
>
> I was tracking down a performance regression and noticed that
> LoopStrengthReduce hoists loop invariants (e.g., the initial formulae of
> indvars) in the reverse order of how they appear in the loop.
>
> This reverse order creates troubles for the StraightLineStrengthReduce
> pass I recently add. While I understand ultimately SLSR should be able to
> sort independent candidates in an optimal order, does it make sense to
> canonicalizing the order of LSR-hoisted loop invariants back to the
> "natural" order? IMO, the optimized code should in general resemble the
> original code unless intentionally changed otherwise.
>
> More specifically, I ran LSR on
>
> void foo(float *input, int a, int b, int c, int n) {
> for (int node_x = 0; node_x < n; ++node_x) {
> int pixel_idx = (a + node_x) * b; // {a*b, +, b}
> bar(pixel_idx * c); // {a*b*c, +, b*c}
> bar((pixel_idx + 1) * c); // {(a*b+1)*c, +, b*c}
> bar((pixel_idx + 2) * c); // {(a*b+2)*c, +, b*c}
> bar((pixel_idx + 3) * c); // {(a*b+3)*c, +, b*c}
> }
> }
>
> and LSR produced
>
> void foo(float *input, int a, int b, int c, int n) {
> * int arg3 = (a * b + 3) * c;*
> * int arg2 = (a * b + 2) * c;*
> * int arg1 = (a * b + 1) * c;*
> * int arg0 = a * b * c;*
> for (int node_x = 0; node_x < n; ++node_x) {
> bar(arg0);
> bar(arg1);
> bar(arg2);
> bar(arg3);
> arg0 += b * c;
> arg1 += b * c;
> arg2 += b * c;
> arg3 += b * c;
> }
> }
> (with obvious redundant operations, i.e. a * b and b * c, combined). Note
> that the order of arg0~3 is reversed.
>
> Reversing the order of arg0~3 is not intentional. The user list of
> pixel_idx happens to have pixel_idx+3, pixel_idx+2, and pixel_idx+1 in this
> order, so LSR simply follows this order when collecting the LSRFixups.
>
> This creates troubles for SLSR. Given the current order of arg0~arg3
>
> int arg3 = (a * b + 3) * c;
> int arg2 = (a * b + 2) * c;
> int arg1 = (a * b + 1) * c;
> int arg0 = a * b * c;
>
> SLSR optimizes them to
>
> int arg3 = (a * b + 3) * c;
> int arg2 = arg3 - c;
> int arg1 = arg2 - c;
> int arg0 = arg1 - c;
> // 2 muls and 4 adds/subs
>
> In contrast, if arg0~3 were in the order of
>
> int arg0 = a * b * c;
> int arg1 = (a * b + 1) * c;
> int arg2 = (a * b + 2) * c;
> int arg3 = (a * b + 3) * c;
>
> SLSR would optimize them to
>
> int arg0 = a * b * c;
> int arg1 = arg0 + c;
> int arg2 = arg1 + c;
> int arg3 = arg2 + c;
> // 2 muls and 3 adds/subs. 1 add/sub less than with the reversed order
>
> I have a proof-of-concept patch (
> http://reviews.llvm.org/differential/diff/25402/) that has
> CollectFixupsAndInitialFormulae to sort initial formulae in a dominance
> order (i.e. if A.getUser() dominates B.getUser(), then we put A before B).
> It breaks some tests that are too sensitive to order changes; besides that,
> I don't see significant issues.
>
> Jingyue
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150518/9261a76c/attachment.html>
More information about the llvm-dev
mailing list