[PATCH] Fix a performance problem in gep(gep ...) merging

Mon Apr 20 21:41:26 PDT 2015

I did some performance testing for the reassociategep patch proposed
by Mark. The testing result was mixed -- reassociategep patch was
better for some benchmarks and disabing gep merging was better for
some other benchmarks, and combining the two patches worked the best.
We did some perf analysis based on some testcases extracted from those
benchmarks with the largest perf difference between the two patches:

* The testcase 1.cc in https://llvm.org/bugs/show_bug.cgi?id=23163
showed reassociategep patch is needed:

For the testcase 1.cc, there are some sext generated for array address
computation. Existing CSE/LICM
cannot find the common expression separated by "sext". A special
reassociation is required.

loop:
... = sext(a + 1) + c  // a and c are loop invariant.
...
... = sext(a - 1) + c
end loop.

CSE/LICM cannot find that "sext(a) + c" is a common expression and a
loop invariant expression. reassociategep does the reassociation so
CSE/LICM can optimize it.

When I changed the testcase a little bit to remove the sext needed,
disabling gep merging patch generated the same code with
reassociategep patch.

<   int c = p1.m_fn1(), x;
---
>   long c = p1.m_fn1(), x;

* The testcase 2.cc attached showed why disabling gep merging is needed.
The command compiling 2.cc is:
~/workarea/llvm-r234390/build/bin/clang++ -O2 -std=c++11
-fno-omit-frame-pointer -fno-strict-aliasing -funsigned-char -S 2.cc
-o 2.s

For the kernel loop, the IR before Codegen Prepare using the
reassociategep patch:

for.body10:
  ...
  %add.ptr.sum = add nsw i64 %mul14, %idx.ext
  %add.ptr16 = getelementptr inbounds i32, i32* %p1, i64 %add.ptr.sum
  %15 = load i32, i32* %add.ptr16, align 4
  ...
  %add.ptr16.sum = add nsw i64 %add.ptr.sum, %idxprom
  %arrayidx19 = getelementptr inbounds i32, i32* %p1, i64 %add.ptr16.sum
  %17 = load i32, i32* %arrayidx19, align 4
  ...
  %add.ptr16.sum82 = add nsw i64 %mul22, %add.ptr.sum
  %add.ptr24 = getelementptr inbounds i32, i32* %p1, i64 %add.ptr16.sum82
  %19 = load i32, i32* %add.ptr24, align 4
  %add.ptr24.sum = add nsw i64 %add.ptr16.sum82, %idxprom
  %arrayidx28 = getelementptr inbounds i32, i32* %p1, i64 %add.ptr24.sum
  %20 = load i32, i32* %arrayidx28, align 4
  ...
  br i1 %exitcond, label %for.cond8.for.end_crit_edge, label %for.body10

The IR before Codegen Prepare using the disabling gep patch:

for.body10:
  ...
  %add.ptr = getelementptr inbounds i32, i32* %p1, i64 %idx.ext
  ...
  %add.ptr16 = getelementptr inbounds i32, i32* %add.ptr, i64 %mul14
  %15 = load i32, i32* %add.ptr16, align 4
  ...
  %arrayidx19 = getelementptr inbounds i32, i32* %add.ptr16, i64 %idxprom
  %17 = load i32, i32* %arrayidx19, align 4
  ...
  %add.ptr24 = getelementptr inbounds i32, i32* %add.ptr16, i64 %mul22
  %19 = load i32, i32* %add.ptr24, align 4
  %arrayidx28 = getelementptr inbounds i32, i32* %add.ptr24, i64 %idxprom
  %20 = load i32, i32* %arrayidx28, align 4
  ...
  br i1 %exitcond, label %for.cond8.for.end_crit_edge, label %for.body10

gep merging generated several add insns which cannot be removed by
reassociategep patch. It is difficult for reassociation to remove
various  redundencies introduced by aggressive gep merging.

I saw Mark already posted his patch for review:
http://reviews.llvm.org/D9136. I will update the patch in next mail to
fix some testcases.

Thanks,
Wei.