[PATCH] D39906: [InstCombine] Allowing GEP Instructions with loop Invariant operands to combine

Fri Nov 10 09:11:47 PST 2017

DIVYA added a comment.

This patch helps to reduce the number of instructions generated by llvm for aarch64 for the longest_match() hottest functions in zlib-ng library.
Profile output obtained from valgrind for longest_match() function in match.c  , after compiling with gcc for x86_64  at -O3

  .    .    .           .          .    .          .    .    .          unsigned char *win = s->window;
   .    .    .           .          .    .          .    .    .          int cont = 1;
   .    .    .           .          .    .          .    .    .          do {

330,962,412    1    1           0          0    0          0    0    0              match = win + cur_match;
330,962,412    0    0 165,481,206 63,723,323    0          0    0    0              if (likely(*(uint16_t*)(match+best_len-1) != scan_end)) {
623,919,768    0    0 155,979,942 86,247,452    0          0    0    0                  if ((cur_match = prev[cur_match & wmask]) > limit
308,955,535    0    0   5,387,447      1,860    0          0    0    0                      && --chain_length != 0) {

Assembly code before applying the patch.
The code contains 2 adds inside the loop
.LBB0_7:                                // %do.body37

                                  //   in Loop: Header=BB0_8 Depth=2
  add             x26, x8, x20
  add             x10, x26, x9
  ldurh   w10, [x10, #-1]
  cmp             w10, w28, uxth
  b.eq    .LBB0_10

After applying the patch
.LBB0_8:                                // %if.then49

                                        //   Parent Loop BB0_5 Depth=1
                                        // =>  This Inner Loop Header: Depth=2

  add   x26, x8, x20
  ldrh    w10, [x26, x9]
  cmp   w10, w28, uxth
  b.ne  .LBB0_8

.LBB0_11:

https://reviews.llvm.org/D39906