<div dir="ltr">Hi,<div><br></div><div>Coremark really isn't a good enough test - have you run the LLVM test suite with this patch, and what were the performance differences?</div><div><br></div><div>I'm still a bit confused about what pattern exactly this pass is supposed to trigger on. I understand the mechanics, but I still can't quite see what patterns it would be useful on. You've mentioned matrix multiply - how does this pass alter the IR? What value is it avoiding being recomputed? How does this pass affect register pressure?</div><div><br></div><div>Also, your example just removes a mov and an add - the push/pops are just register allocation (unless your pass in fact *reduces* register pressure?)</div><div><br></div><div>A bit more clarification would be great.</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, 1 Sep 2015 at 19:07 Steve King via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem<br>

<<a href="mailto:jvanadrighem@gmail.com" target="_blank">jvanadrighem@gmail.com</a>> wrote:<br>

> Do you have some specific performance measurements?<br>

<br>

Averaging 4 runs of 10000 iterations each of Coremark on my X86_64<br>

desktop showed:<br>

<br>

-O2 performance: +2.9% faster with the L.E.V. pass<br>

-Os size: 1.5% smaller with the L.E.V. pass<br>

<br>

In the case of Coremark, the benefit comes mainly from the matrix<br>

portion benchmark, which uses nested loops.  Similarly, I used a<br>

matrix multiplication for the regression test as shown below.  The<br>

L.E.V. pass eliminated 4 instructions.<br>

<br>

void matrix_mul(unsigned int Size, unsigned int *Dst, unsigned int<br>

*Src, unsigned int Val) {<br>

  for (int Outer = 0; Outer < Size; ++Outer)<br>

    for (int Inner = 0; Inner < Size; ++Inner)<br>

       Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val;<br>

}<br>

<br>

<br>

With LoopExitValues<br>

-------------------------------<br>

matrix_mul:<br>

    testl %edi, %edi<br>

    je .LBB0_5<br>

    xorl %r9d, %r9d<br>

    xorl %r8d, %r8d<br>

.LBB0_2:<br>

    xorl %r11d, %r11d<br>

.LBB0_3:<br>

    movl %r9d, %r10d<br>

    movl (%rdx,%r10,4), %eax<br>

    imull %ecx, %eax<br>

    movl %eax, (%rsi,%r10,4)<br>

    incl %r11d<br>

    incl %r9d<br>

    cmpl %r11d, %edi<br>

    jne .LBB0_3<br>

    incl %r8d<br>

    cmpl %edi, %r8d<br>

    jne .LBB0_2<br>

.LBB0_5:<br>

    retq<br>

<br>

<br>

<br>

Without LoopExitValues:<br>

-----------------------------------<br>

matrix_mul:<br>

    pushq %rbx           # Eliminated by L.E.V. pass<br>

.Ltmp0:<br>

.Ltmp1:<br>

    testl %edi, %edi<br>

    je .LBB0_5<br>

    xorl %r8d, %r8d<br>

    xorl %r9d, %r9d<br>

.LBB0_2:<br>

    xorl %r10d, %r10d<br>

    movl %r8d, %eax              # Eliminated by L.E.V. pass<br>

.LBB0_3:<br>

    movl %eax, %r11d<br>

    movl (%rdx,%r11,4), %ebx<br>

    imull %ecx, %ebx<br>

    movl %ebx, (%rsi,%r11,4)<br>

    incl %r10d<br>

    incl %eax<br>

    cmpl %r10d, %edi<br>

    jne .LBB0_3<br>

    incl %r9d<br>

    addl %edi, %r8d            # Eliminated by L.E.V. pass<br>

    cmpl %edi, %r9d<br>

    jne .LBB0_2<br>

.LBB0_5:<br>

    popq %rbx                    # Eliminated by L.E.V. pass<br>

    retq<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div>