[llvm-commits] [llvm-testresults] lab-mini-01__O3-plain__clang_DEV__x86_64 test results

Jakob Stoklund Olesen stoklund at 2pi.dk
Fri May 25 17:10:56 PDT 2012


On May 25, 2012, at 3:07 PM, Chandler Carruth <chandlerc at google.com> wrote:

> On Fri, May 25, 2012 at 8:28 AM, Duncan Sands <baldrick at free.fr> wrote:
> # phi213.i is in %ebx
> 
> +       leaq    1(%rdx), %rsi
> +       xorl    %esi, %ebx
>         xorl    (%rax,%rdx,4), %ebx
> -       incq    %rdx
> -       xorl    %edx, %ebx
> -       cmpl    $500001, %edx           # imm = 0x7A121
> +       cmpl    $500001, %esi           # imm = 0x7A121
> +       movq    %rsi, %rdx
> 
> I'm not sure why this codegen difference arises.  Any suggestions?
> 
> One, largely uninformed idea is the dependency chains (in addition to the leaq stuff mentioned by Jakob):
> 
> Old code, we execute the load+xor first, and while in flight we can increment rdx, and execute the comparison with edx. whenever the load+xor lands, we can execute the second xor.
> 
> New code, the leaq must execute before the first xor, and the first xor must execute before the second xor. That means we may not be able to overlap as much of the load delay.

I can probably explain why the new code is slow.

You get these µops with the old code:

xorl -> (lea, load, xor1)
incq -> (add)
xorl -> (xor2)
cmpl -> (cmp)
jne -> (branch)

There are two loop-carried dependencies:

add -> add
xor1 -> xor2 ->xor1

The xor chain means the loop requires at least 2 cycles per iteration. Nehalem can execute 3 ALU µops per cycle, and we have 6 per iteration (lea, xor1, add, xor2, cmp, branch).

The extra movq brings us to 7 ALU ops per iteration, which can't happen in 2 cycles. (Ivy bridge doesn't need a pipeline slot for a move, so it probably won't see the regression). 

There is also a scheduling issue:

        cmpl    $500001, %esi           # imm = 0x7A121
        movq    %rsi, %rdx
        jne     .LBB1_18

If we place the cmpl and jne next to each other, they should fuse into one µop, and we are down to 6 ALU µops per iteration again.

/jakob

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120525/f05a028f/attachment.html>


More information about the llvm-commits mailing list