[llvm-dev] memcmp code fragment

Fri May 19 13:17:09 PDT 2017

Some newb questions:
 Is this unrolled loop a pattern that loop reroll gets? If not, then can/could/should it be enhanced to get it? Does loop idiom run after reroll?

 Loop idiom currently doesn’t have a pattern match for memcmp — there is no llvm.memcmp intrinsic. But, this looks like a good motivating example to create one...

-Daniel

> On May 19, 2017, at 3:06 PM, Hans Wennborg via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> On Fri, May 19, 2017 at 12:46 PM, Sirish Pande via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hi,
>> 
>> Look at the following code:
>> 
>> Look at the following C code  seqence:
>> 
>> unsigned char mainGtU ( unsigned int i1,
>>               unsigned int i2,
>>               unsigned char* block)
>> {
>>   unsigned char c1, c2;
>>   c1 = block[i1]; c2 = block[i2];
>>   if (c1 != c2) return (c1 > c2);
>>   i1++; i2++;
>> 
>>   c1 = block[i1]; c2 = block[i2];
>>   if (c1 != c2) return (c1 > c2);
>>   i1++; i2++;
>> 
>> ..
>> ..
>> <repeat 12 times>
>> 
>> In LLVM IR it will be following:
>> 
>> define i8 @mainGtU(i32 %i1, i32 %i2, i8* readonly %block, i16* nocapture
>> readnone %quadrant, i32 %nblock, i32* nocapture readnone %budget)
>> local_unnamed_addr #0 {
>> entry:
>>  %idxprom = zext i32 %i1 to i64
>>  %arrayidx = getelementptr inbounds i8, i8* %block, i64 %idxprom
>>  %0 = load i8, i8* %arrayidx, align 1
>>  %idxprom1 = zext i32 %i2 to i64
>>  %arrayidx2 = getelementptr inbounds i8, i8* %block, i64 %idxprom1
>>  %1 = load i8, i8* %arrayidx2, align 1
>>  %cmp = icmp eq i8 %0, %1
>>  br i1 %cmp, label %if.end, label %if.then
>> 
>> if.then:                                          ; preds = %entry
>>  %cmp7 = icmp ugt i8 %0, %1
>>  br label %return
>> 
>> if.end:                                           ; preds = %entry
>>  %inc = add i32 %i1, 1
>>  %inc10 = add i32 %i2, 1
>>  %idxprom11 = zext i32 %inc to i64
>>  %arrayidx12 = getelementptr inbounds i8, i8* %block, i64 %idxprom11
>>  %2 = load i8, i8* %arrayidx12, align 1
>>  %idxprom13 = zext i32 %inc10 to i64
>>  %arrayidx14 = getelementptr inbounds i8, i8* %block, i64 %idxprom13
>>  %3 = load i8, i8* %arrayidx14, align 1
>>  %cmp17 = icmp eq i8 %2, %3
>>  br i1 %cmp17, label %if.end25, label %if.then19
>> 
>> if.then19:                                        ; preds = %if.end
>>  %cmp22 = icmp ugt i8 %2, %3
>>  br label %return
>> 
>> ..
>> ..
>> <repeats 12 times>
>> 
>> This code sequence can be collapsed into call to  memcmp and we can get rid
>> of basic blocks. I have written a small peephole optimization for squenece
>> of instructions that identifies
>> branch termiantor, compare, load, gep etc and converts them to a call to
>> memcmp. This small pass gave me improvement of 67% on SPEC2000 bzip2 on X86.
>> 
>> Is there a better idea, other than small peephole pass on IR to optimize
>> this code?
> 
> There is LoopIdiomRecognize which does transformations like this, but
> only for loops, not unrolled code like your example.
> 
> It would be very cool if we could somehow make that pass also
> recognize unrolled patterns, both for memcmp, and other operations.
> 
> I don't have any specific ideas for how to do that, but the
> improvement you saw suggests it might be very worthwhile :-)
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev