[llvm-dev] memcmp code fragment

Fri May 19 13:14:27 PDT 2017

Thanks Hans.

I did look at LoopIdiomRecognize and also Loop rerolling - but like you
said, they only work on loop. And having a control flow makes it a bit
complicated. I have test cases that help in X86 and Aarch64. I will put up
my patch for review and you guys can have a look.

Another thing, I noticed is that there is no __builtin_memcmp in llvm.
Maybe that will be good to have as well.

Sirish

On Fri, May 19, 2017 at 3:06 PM, Hans Wennborg <hans at chromium.org> wrote:

> On Fri, May 19, 2017 at 12:46 PM, Sirish Pande via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Hi,
> >
> > Look at the following code:
> >
> > Look at the following C code  seqence:
> >
> > unsigned char mainGtU ( unsigned int i1,
> >                unsigned int i2,
> >                unsigned char* block)
> > {
> >    unsigned char c1, c2;
> >    c1 = block[i1]; c2 = block[i2];
> >    if (c1 != c2) return (c1 > c2);
> >    i1++; i2++;
> >
> >    c1 = block[i1]; c2 = block[i2];
> >    if (c1 != c2) return (c1 > c2);
> >    i1++; i2++;
> >
> > ..
> > ..
> > <repeat 12 times>
> >
> > In LLVM IR it will be following:
> >
> > define i8 @mainGtU(i32 %i1, i32 %i2, i8* readonly %block, i16* nocapture
> > readnone %quadrant, i32 %nblock, i32* nocapture readnone %budget)
> > local_unnamed_addr #0 {
> > entry:
> >   %idxprom = zext i32 %i1 to i64
> >   %arrayidx = getelementptr inbounds i8, i8* %block, i64 %idxprom
> >   %0 = load i8, i8* %arrayidx, align 1
> >   %idxprom1 = zext i32 %i2 to i64
> >   %arrayidx2 = getelementptr inbounds i8, i8* %block, i64 %idxprom1
> >   %1 = load i8, i8* %arrayidx2, align 1
> >   %cmp = icmp eq i8 %0, %1
> >   br i1 %cmp, label %if.end, label %if.then
> >
> > if.then:                                          ; preds = %entry
> >   %cmp7 = icmp ugt i8 %0, %1
> >   br label %return
> >
> > if.end:                                           ; preds = %entry
> >   %inc = add i32 %i1, 1
> >   %inc10 = add i32 %i2, 1
> >   %idxprom11 = zext i32 %inc to i64
> >   %arrayidx12 = getelementptr inbounds i8, i8* %block, i64 %idxprom11
> >   %2 = load i8, i8* %arrayidx12, align 1
> >   %idxprom13 = zext i32 %inc10 to i64
> >   %arrayidx14 = getelementptr inbounds i8, i8* %block, i64 %idxprom13
> >   %3 = load i8, i8* %arrayidx14, align 1
> >   %cmp17 = icmp eq i8 %2, %3
> >   br i1 %cmp17, label %if.end25, label %if.then19
> >
> > if.then19:                                        ; preds = %if.end
> >   %cmp22 = icmp ugt i8 %2, %3
> >   br label %return
> >
> > ..
> > ..
> > <repeats 12 times>
> >
> > This code sequence can be collapsed into call to  memcmp and we can get
> rid
> > of basic blocks. I have written a small peephole optimization for
> squenece
> > of instructions that identifies
> > branch termiantor, compare, load, gep etc and converts them to a call to
> > memcmp. This small pass gave me improvement of 67% on SPEC2000 bzip2 on
> X86.
> >
> > Is there a better idea, other than small peephole pass on IR to optimize
> > this code?
>
> There is LoopIdiomRecognize which does transformations like this, but
> only for loops, not unrolled code like your example.
>
> It would be very cool if we could somehow make that pass also
> recognize unrolled patterns, both for memcmp, and other operations.
>
> I don't have any specific ideas for how to do that, but the
> improvement you saw suggests it might be very worthwhile :-)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170519/57a3fdd4/attachment.html>