[llvm-bugs] [Bug 25782] New: [ppc] bad code layout causes slower than gcc in 403.gcc
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Dec 8 16:48:46 PST 2015
https://llvm.org/bugs/show_bug.cgi?id=25782
Bug ID: 25782
Summary: [ppc] bad code layout causes slower than gcc in
403.gcc
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Backend: PowerPC
Assignee: unassignedbugs at nondot.org
Reporter: carrot at google.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
LLVM generated 403.gcc is 1.7% slower than gcc generated code on power8, for
input data g23.i, llvm is 8.68% slower.
In the perf result, in gcc generated code 10.03% of time is consumed by
htab_traverse, but in llvm generated code 16.48% of time is consumed by the
same function.
GCC generated following code for the loop body of function htab_traverse:
99.87 : 1032c050: addi r31,r31,8 // HOT
0.00 : 1032c054: cmpld cr7,r30,r31 // HOT
0.00 : 1032c058: ble cr7,1032c08c <htab_traverse+0x7c> // HOT
0.01 : 1032c05c: ld r9,0(r31) // HOT
0.03 : 1032c060: cmpldi cr7,r9,1 // HOT
0.00 : 1032c064: ble cr7,1032c050 <htab_traverse+0x40> // HOT
0.06 : 1032c068: mtctr r29
0.00 : 1032c06c: mr r3,r31
0.00 : 1032c070: std r2,24(r1)
0.00 : 1032c074: mr r4,r28
0.00 : 1032c078: mr r12,r29
0.00 : 1032c07c: bctrl
0.01 : 1032c080: ld r2,24(r1)
0.00 : 1032c084: cmpdi cr7,r3,0
0.00 : 1032c088: bne cr7,1032c050 <htab_traverse+0x40>
LLVM generated following corresponding code:
66.56 : 10306b20: ldu r3,8(r26) // HOT
0.00 : 10306b24: cmpldi r3,2 // HOT
0.00 : 10306b28: blt 10306b50 <htab_traverse+0x80> // HOT
0.03 : 10306b2c: mtctr r28
0.00 : 10306b30: mr r3,r30
0.00 : 10306b34: mr r4,r29
0.00 : 10306b38: mr r12,r28
0.00 : 10306b3c: std r2,24(r1)
0.01 : 10306b40: bctrl
0.01 : 10306b44: ld r2,24(r1)
0.00 : 10306b48: cmplwi r3,0
0.00 : 10306b4c: beq 10306b5c <htab_traverse+0x8c>
33.38 : 10306b50: addi r30,r30,8 // HOT
0.00 : 10306b54: cmpld r30,r27 // HOT
0.00 : 10306b58: blt 10306b20 <htab_traverse+0x50> // HOT
So we can see that both compiler generate similar instructions, but with
different code layout. In gcc's code, all hot BBs are put together, but in
llvm's code hot BBs are separated, the taken branch causes slower performance.
So this is a code layout problem.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20151209/cf9308b7/attachment-0001.html>
More information about the llvm-bugs
mailing list