[PATCH] D19659: [X86] Enable RRL part of the LEA optimization pass for -O2

Wed May 4 08:43:40 PDT 2016

aturetsk added a comment.

The RRL part of the LEA pass takes a sane amount of compile time.
Here are the measurements.

-Os, the LEA pass is completely disabled:

  real    0m57.797s
  user    0m57.448s
  sys     0m0.337s

-Os, only the RRL part of the LEA pass is enabled:

  real    1m3.238s
  user    1m2.868s
  sys     0m0.352s

-Os, the LEA pass is fully enabled:

  real    1m12.568s
  user    1m12.193s
  sys     0m0.354s

The test was generated by the script:

  $ python gen.py 5000 > test.c
  $ cat gen.py

  import sys

  def foo(n):
    print 'struct { int a, b, c; } arr[1000000];'
    print ''
    print 'int foo(int x) {'
    print '  int r = 0;'
    for i in range(n):
      print '  r += arr[x + %d].a + arr[x + %d].b + arr[x + %d].c;' % (i, i, i);
    print '  switch (r) {'
    print '  case 1:'
    for i in range(n):
      print '    arr[x + %d].b = 111;' % (i);
      print '    arr[x + %d].c = 111;' % (i);
    print '    break;'
    print '  case 2:'
    for i in range(n):
      print '    arr[x + %d].b = 222;' % (i);
      print '    arr[x + %d].c = 222;' % (i);
    print '    break;'
    print '  default:'
    for i in range(n):
      # Make the LEAs irreplaceable, so that no LEAs would be removed by the LEA
      # pass and thus there would be no compile-time improvement because of the
      # reduced number of instructions which need to be processed by the
      # compiler in other passes
      print '    arr[x + %d].b = (int) &arr[x + %d].b;' % (i, i);
      print '    arr[x + %d].c = (int) &arr[x + %d].c;' % (i, i);
    print '    break;'
    print '  }'
    print '  return r;'
    print '}'

  if __name__ == '__main__':
    foo(int(sys.argv[1]))

The run command:

  time ./bin/clang -Os -S test.c

http://reviews.llvm.org/D19659