[LLVMdev] [LSR] hoisting loop invariants in reverse order

Jingyue Wu jingyue at google.com
Fri May 8 22:22:32 PDT 2015


Hi,

I was tracking down a performance regression and noticed that
LoopStrengthReduce hoists loop invariants (e.g., the initial formulae of
indvars) in the reverse order of how they appear in the loop.

This reverse order creates troubles for the StraightLineStrengthReduce pass
I recently add. While I understand ultimately SLSR should be able to sort
independent candidates in an optimal order, does it make sense to
canonicalizing the order of LSR-hoisted loop invariants back to the
"natural" order? IMO, the optimized code should in general resemble the
original code unless intentionally changed otherwise.

More specifically, I ran LSR on

void foo(float *input, int a, int b, int c, int n) {
  for (int node_x = 0; node_x < n; ++node_x) {
    int pixel_idx = (a + node_x) * b; // {a*b, +, b}
    bar(pixel_idx * c); // {a*b*c, +, b*c}
    bar((pixel_idx + 1) * c); // {(a*b+1)*c, +, b*c}
    bar((pixel_idx + 2) * c); // {(a*b+2)*c, +, b*c}
    bar((pixel_idx + 3) * c); // {(a*b+3)*c, +, b*c}
  }
}

and LSR produced

void foo(float *input, int a, int b, int c, int n) {
*  int arg3 = (a * b + 3) * c;*
*  int arg2 = (a * b + 2) * c;*
*  int arg1 = (a * b + 1) * c;*
*  int arg0 = a * b * c;*
  for (int node_x = 0; node_x < n; ++node_x) {
    bar(arg0);
    bar(arg1);
    bar(arg2);
    bar(arg3);
    arg0 += b * c;
    arg1 += b * c;
    arg2 += b * c;
    arg3 += b * c;
  }
}
(with obvious redundant operations, i.e. a * b and b * c, combined). Note
that the order of arg0~3 is reversed.

Reversing the order of arg0~3 is not intentional. The user list of
pixel_idx happens to have pixel_idx+3, pixel_idx+2, and pixel_idx+1 in this
order, so LSR simply follows this order when collecting the LSRFixups.

This creates troubles for SLSR. Given the current order of arg0~arg3

  int arg3 = (a * b + 3) * c;
  int arg2 = (a * b + 2) * c;
  int arg1 = (a * b + 1) * c;
  int arg0 = a * b * c;

SLSR optimizes them to

  int arg3 = (a * b + 3) * c;
  int arg2 = arg3 - c;
  int arg1 = arg2 - c;
  int arg0 = arg1 - c;
  // 2 muls and 4 adds/subs

In contrast, if arg0~3 were in the order of

  int arg0 = a * b * c;
  int arg1 = (a * b + 1) * c;
  int arg2 = (a * b + 2) * c;
  int arg3 = (a * b + 3) * c;

SLSR would optimize them to

  int arg0 = a * b * c;
  int arg1 = arg0 + c;
  int arg2 = arg1 + c;
  int arg3 = arg2 + c;
  // 2 muls and 3 adds/subs. 1 add/sub less than with the reversed order

I have a proof-of-concept patch (
http://reviews.llvm.org/differential/diff/25402/) that has
CollectFixupsAndInitialFormulae to sort initial formulae in a dominance
order (i.e. if A.getUser() dominates B.getUser(), then we put A before B).
It breaks some tests that are too sensitive to order changes; besides that,
I don't see significant issues.

Jingyue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150508/112a0bef/attachment.html>


More information about the llvm-dev mailing list