<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - [ppc] bad code layout causes slower than gcc in 403.gcc"
href="https://llvm.org/bugs/show_bug.cgi?id=25782">25782</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[ppc] bad code layout causes slower than gcc in 403.gcc
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: PowerPC
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>carrot@google.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>LLVM generated 403.gcc is 1.7% slower than gcc generated code on power8, for
input data g23.i, llvm is 8.68% slower.
In the perf result, in gcc generated code 10.03% of time is consumed by
htab_traverse, but in llvm generated code 16.48% of time is consumed by the
same function.
GCC generated following code for the loop body of function htab_traverse:
99.87 : 1032c050: addi r31,r31,8 // HOT
0.00 : 1032c054: cmpld cr7,r30,r31 // HOT
0.00 : 1032c058: ble cr7,1032c08c <htab_traverse+0x7c> // HOT
0.01 : 1032c05c: ld r9,0(r31) // HOT
0.03 : 1032c060: cmpldi cr7,r9,1 // HOT
0.00 : 1032c064: ble cr7,1032c050 <htab_traverse+0x40> // HOT
0.06 : 1032c068: mtctr r29
0.00 : 1032c06c: mr r3,r31
0.00 : 1032c070: std r2,24(r1)
0.00 : 1032c074: mr r4,r28
0.00 : 1032c078: mr r12,r29
0.00 : 1032c07c: bctrl
0.01 : 1032c080: ld r2,24(r1)
0.00 : 1032c084: cmpdi cr7,r3,0
0.00 : 1032c088: bne cr7,1032c050 <htab_traverse+0x40>
LLVM generated following corresponding code:
66.56 : 10306b20: ldu r3,8(r26) // HOT
0.00 : 10306b24: cmpldi r3,2 // HOT
0.00 : 10306b28: blt 10306b50 <htab_traverse+0x80> // HOT
0.03 : 10306b2c: mtctr r28
0.00 : 10306b30: mr r3,r30
0.00 : 10306b34: mr r4,r29
0.00 : 10306b38: mr r12,r28
0.00 : 10306b3c: std r2,24(r1)
0.01 : 10306b40: bctrl
0.01 : 10306b44: ld r2,24(r1)
0.00 : 10306b48: cmplwi r3,0
0.00 : 10306b4c: beq 10306b5c <htab_traverse+0x8c>
33.38 : 10306b50: addi r30,r30,8 // HOT
0.00 : 10306b54: cmpld r30,r27 // HOT
0.00 : 10306b58: blt 10306b20 <htab_traverse+0x50> // HOT
So we can see that both compiler generate similar instructions, but with
different code layout. In gcc's code, all hot BBs are put together, but in
llvm's code hot BBs are separated, the taken branch causes slower performance.
So this is a code layout problem.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>