[llvm-bugs] [Bug 25782] New: [ppc] bad code layout causes slower than gcc in 403.gcc

via llvm-bugs llvm-bugs at lists.llvm.org
Tue Dec 8 16:48:46 PST 2015


https://llvm.org/bugs/show_bug.cgi?id=25782

            Bug ID: 25782
           Summary: [ppc] bad code layout causes slower than gcc in
                    403.gcc
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: PowerPC
          Assignee: unassignedbugs at nondot.org
          Reporter: carrot at google.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

LLVM generated 403.gcc is 1.7% slower than gcc generated code on power8, for
input data g23.i, llvm is 8.68% slower.

In the perf result, in gcc generated code 10.03% of time is consumed by
htab_traverse, but in llvm generated code 16.48% of time is consumed by the
same function.

GCC generated following code for the loop body of function htab_traverse:

   99.87 :        1032c050:   addi    r31,r31,8                         //  HOT
    0.00 :        1032c054:   cmpld   cr7,r30,r31                       //  HOT
    0.00 :        1032c058:   ble     cr7,1032c08c <htab_traverse+0x7c> //  HOT
    0.01 :        1032c05c:   ld      r9,0(r31)                         //  HOT
    0.03 :        1032c060:   cmpldi  cr7,r9,1                          //  HOT
    0.00 :        1032c064:   ble     cr7,1032c050 <htab_traverse+0x40> //  HOT
    0.06 :        1032c068:   mtctr   r29
    0.00 :        1032c06c:   mr      r3,r31
    0.00 :        1032c070:   std     r2,24(r1)
    0.00 :        1032c074:   mr      r4,r28
    0.00 :        1032c078:   mr      r12,r29
    0.00 :        1032c07c:   bctrl
    0.01 :        1032c080:   ld      r2,24(r1)
    0.00 :        1032c084:   cmpdi   cr7,r3,0
    0.00 :        1032c088:   bne     cr7,1032c050 <htab_traverse+0x40>

LLVM generated following corresponding code:

   66.56 :        10306b20:   ldu     r3,8(r26)                        //  HOT
    0.00 :        10306b24:   cmpldi  r3,2                             //  HOT
    0.00 :        10306b28:   blt     10306b50 <htab_traverse+0x80>    //  HOT
    0.03 :        10306b2c:   mtctr   r28
    0.00 :        10306b30:   mr      r3,r30
    0.00 :        10306b34:   mr      r4,r29
    0.00 :        10306b38:   mr      r12,r28
    0.00 :        10306b3c:   std     r2,24(r1)
    0.01 :        10306b40:   bctrl
    0.01 :        10306b44:   ld      r2,24(r1)
    0.00 :        10306b48:   cmplwi  r3,0
    0.00 :        10306b4c:   beq     10306b5c <htab_traverse+0x8c>
   33.38 :        10306b50:   addi    r30,r30,8                       //  HOT
    0.00 :        10306b54:   cmpld   r30,r27                         //  HOT
    0.00 :        10306b58:   blt     10306b20 <htab_traverse+0x50>   //  HOT


So we can see that both compiler generate similar instructions, but with
different code layout. In gcc's code, all hot BBs are put together, but in
llvm's code hot BBs are separated, the taken branch causes slower performance.

So this is a code layout problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20151209/cf9308b7/attachment-0001.html>


More information about the llvm-bugs mailing list