<div dir="ltr">Loop reroll will not get this pattern. Loop reroll only works on a loop. There is no loop around this code sequence, neither was it unroll. In this case, it's the code as is. But I do see your point, unrolling would result in such a code sequence as well. And maybe we need to teach loop idiom to recognize this patterm and have llvm intrinsic for it.<div><br></div><div>To answer your question, loop idiom runs before loop reroll. </div><div><br></div><div>Sirish</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 19, 2017 at 3:17 PM, Daniel Neilson via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Some newb questions:<br>

 Is this unrolled loop a pattern that loop reroll gets? If not, then can/could/should it be enhanced to get it? Does loop idiom run after reroll?<br>

<br>

 Loop idiom currently doesn’t have a pattern match for memcmp — there is no llvm.memcmp intrinsic. But, this looks like a good motivating example to create one...<br>

<br>

-Daniel<br>

<div><div class="h5"><br>

> On May 19, 2017, at 3:06 PM, Hans Wennborg via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

><br>

> On Fri, May 19, 2017 at 12:46 PM, Sirish Pande via llvm-dev<br>

> <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

>> Hi,<br>

>><br>

>> Look at the following code:<br>

>><br>

>> Look at the following C code  seqence:<br>

>><br>

>> unsigned char mainGtU ( unsigned int i1,<br>

>>               unsigned int i2,<br>

>>               unsigned char* block)<br>

>> {<br>

>>   unsigned char c1, c2;<br>

>>   c1 = block[i1]; c2 = block[i2];<br>

>>   if (c1 != c2) return (c1 > c2);<br>

>>   i1++; i2++;<br>

>><br>

>>   c1 = block[i1]; c2 = block[i2];<br>

>>   if (c1 != c2) return (c1 > c2);<br>

>>   i1++; i2++;<br>

>><br>

>> ..<br>

>> ..<br>

>> <repeat 12 times><br>

>><br>

>> In LLVM IR it will be following:<br>

>><br>

>> define i8 @mainGtU(i32 %i1, i32 %i2, i8* readonly %block, i16* nocapture<br>

>> readnone %quadrant, i32 %nblock, i32* nocapture readnone %budget)<br>

>> local_unnamed_addr #0 {<br>

>> entry:<br>

>>  %idxprom = zext i32 %i1 to i64<br>

>>  %arrayidx = getelementptr inbounds i8, i8* %block, i64 %idxprom<br>

>>  %0 = load i8, i8* %arrayidx, align 1<br>

>>  %idxprom1 = zext i32 %i2 to i64<br>

>>  %arrayidx2 = getelementptr inbounds i8, i8* %block, i64 %idxprom1<br>

>>  %1 = load i8, i8* %arrayidx2, align 1<br>

>>  %cmp = icmp eq i8 %0, %1<br>

>>  br i1 %cmp, label %if.end, label %if.then<br>

>><br>

>> if.then:                                          ; preds = %entry<br>

>>  %cmp7 = icmp ugt i8 %0, %1<br>

>>  br label %return<br>

>><br>

>> if.end:                                           ; preds = %entry<br>

>>  %inc = add i32 %i1, 1<br>

>>  %inc10 = add i32 %i2, 1<br>

>>  %idxprom11 = zext i32 %inc to i64<br>

>>  %arrayidx12 = getelementptr inbounds i8, i8* %block, i64 %idxprom11<br>

>>  %2 = load i8, i8* %arrayidx12, align 1<br>

>>  %idxprom13 = zext i32 %inc10 to i64<br>

>>  %arrayidx14 = getelementptr inbounds i8, i8* %block, i64 %idxprom13<br>

>>  %3 = load i8, i8* %arrayidx14, align 1<br>

>>  %cmp17 = icmp eq i8 %2, %3<br>

>>  br i1 %cmp17, label %if.end25, label %if.then19<br>

>><br>

>> if.then19:                                        ; preds = %if.end<br>

>>  %cmp22 = icmp ugt i8 %2, %3<br>

>>  br label %return<br>

>><br>

>> ..<br>

>> ..<br>

>> <repeats 12 times><br>

>><br>

>> This code sequence can be collapsed into call to  memcmp and we can get rid<br>

>> of basic blocks. I have written a small peephole optimization for squenece<br>

>> of instructions that identifies<br>

>> branch termiantor, compare, load, gep etc and converts them to a call to<br>

>> memcmp. This small pass gave me improvement of 67% on SPEC2000 bzip2 on X86.<br>

>><br>

>> Is there a better idea, other than small peephole pass on IR to optimize<br>

>> this code?<br>

><br>

> There is LoopIdiomRecognize which does transformations like this, but<br>

> only for loops, not unrolled code like your example.<br>

><br>

> It would be very cool if we could somehow make that pass also<br>

> recognize unrolled patterns, both for memcmp, and other operations.<br>

><br>

> I don't have any specific ideas for how to do that, but the<br>

> improvement you saw suggests it might be very worthwhile :-)<br>

</div></div>> ______________________________<wbr>_________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

<br>

______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br></div>