[PATCH] Fix a performance problem in gep(gep ...) merging

Mon Apr 20 21:44:28 PDT 2015

Forgot to attach the testcase 2.cc.

On Mon, Apr 20, 2015 at 9:41 PM, Wei Mi <wmi at google.com> wrote:
> I did some performance testing for the reassociategep patch proposed
> by Mark. The testing result was mixed -- reassociategep patch was
> better for some benchmarks and disabing gep merging was better for
> some other benchmarks, and combining the two patches worked the best.
> We did some perf analysis based on some testcases extracted from those
> benchmarks with the largest perf difference between the two patches:
>
> * The testcase 1.cc in https://llvm.org/bugs/show_bug.cgi?id=23163
> showed reassociategep patch is needed:
>
> For the testcase 1.cc, there are some sext generated for array address
> computation. Existing CSE/LICM
> cannot find the common expression separated by "sext". A special
> reassociation is required.
>
> loop:
> ... = sext(a + 1) + c  // a and c are loop invariant.
> ...
> ... = sext(a - 1) + c
> end loop.
>
> CSE/LICM cannot find that "sext(a) + c" is a common expression and a
> loop invariant expression. reassociategep does the reassociation so
> CSE/LICM can optimize it.
>
> When I changed the testcase a little bit to remove the sext needed,
> disabling gep merging patch generated the same code with
> reassociategep patch.
>
> <   int c = p1.m_fn1(), x;
> ---
>>   long c = p1.m_fn1(), x;
>
> * The testcase 2.cc attached showed why disabling gep merging is needed.
> The command compiling 2.cc is:
> ~/workarea/llvm-r234390/build/bin/clang++ -O2 -std=c++11
> -fno-omit-frame-pointer -fno-strict-aliasing -funsigned-char -S 2.cc
> -o 2.s
>
> For the kernel loop, the IR before Codegen Prepare using the
> reassociategep patch:
>
> for.body10:
>   ...
>   %add.ptr.sum = add nsw i64 %mul14, %idx.ext
>   %add.ptr16 = getelementptr inbounds i32, i32* %p1, i64 %add.ptr.sum
>   %15 = load i32, i32* %add.ptr16, align 4
>   ...
>   %add.ptr16.sum = add nsw i64 %add.ptr.sum, %idxprom
>   %arrayidx19 = getelementptr inbounds i32, i32* %p1, i64 %add.ptr16.sum
>   %17 = load i32, i32* %arrayidx19, align 4
>   ...
>   %add.ptr16.sum82 = add nsw i64 %mul22, %add.ptr.sum
>   %add.ptr24 = getelementptr inbounds i32, i32* %p1, i64 %add.ptr16.sum82
>   %19 = load i32, i32* %add.ptr24, align 4
>   %add.ptr24.sum = add nsw i64 %add.ptr16.sum82, %idxprom
>   %arrayidx28 = getelementptr inbounds i32, i32* %p1, i64 %add.ptr24.sum
>   %20 = load i32, i32* %arrayidx28, align 4
>   ...
>   br i1 %exitcond, label %for.cond8.for.end_crit_edge, label %for.body10
>
> The IR before Codegen Prepare using the disabling gep patch:
>
> for.body10:
>   ...
>   %add.ptr = getelementptr inbounds i32, i32* %p1, i64 %idx.ext
>   ...
>   %add.ptr16 = getelementptr inbounds i32, i32* %add.ptr, i64 %mul14
>   %15 = load i32, i32* %add.ptr16, align 4
>   ...
>   %arrayidx19 = getelementptr inbounds i32, i32* %add.ptr16, i64 %idxprom
>   %17 = load i32, i32* %arrayidx19, align 4
>   ...
>   %add.ptr24 = getelementptr inbounds i32, i32* %add.ptr16, i64 %mul22
>   %19 = load i32, i32* %add.ptr24, align 4
>   %arrayidx28 = getelementptr inbounds i32, i32* %add.ptr24, i64 %idxprom
>   %20 = load i32, i32* %arrayidx28, align 4
>   ...
>   br i1 %exitcond, label %for.cond8.for.end_crit_edge, label %for.body10
>
> gep merging generated several add insns which cannot be removed by
> reassociategep patch. It is difficult for reassociation to remove
> various  redundencies introduced by aggressive gep merging.
>
> I saw Mark already posted his patch for review:
> http://reviews.llvm.org/D9136. I will update the patch in next mail to
> fix some testcases.
>
> Thanks,
> Wei.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2.cc
Type: text/x-c++src
Size: 574 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150420/8a31d341/attachment.cc>