<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - LLVM built 445.gobmk is 17% slower than gcc on power8"
href="https://llvm.org/bugs/show_bug.cgi?id=24850">24850</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>LLVM built 445.gobmk is 17% slower than gcc on power8
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: PowerPC
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>carrot@google.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>LLVM built 445.gobmk is 17% slower than gcc built binary on power8.
gcc 438s
llvm 512s
For input data trevord.tst, llvm is 18% slower.
The problem is in function popgo. In gcc built binary it consumes 4.11% of
time, in llvm built binary it consumes 13.98% of time.
The related code snippet is in engine/board.c:
struct change_stack_entry {
int *address;
int value;
};
static struct change_stack_entry *change_stack_pointer;
#define POP_MOVE()\
while ((--change_stack_pointer)->address)\
*(change_stack_pointer->address) =\
change_stack_pointer->value
LLVM generated code sequence is:
68.05 : 1000a9f0: ld r3,-22832(r29) // A
0.66 : 1000a9f4: addi r4,r3,-16
0.17 : 1000a9f8: std r4,-22832(r29) // B
0.02 : 1000a9fc: ori r2,r2,0
14.30 : 1000aa00: ld r4,-16(r3)
0.00 : 1000aa04: cmpldi r4,0
0.00 : 1000aa08: beq 1000aa18 <popgo+0xa8>
0.53 : 1000aa0c: lwz r3,-8(r3)
0.11 : 1000aa10: stw r3,0(r4)
0.00 : 1000aa14: b 1000a9f0 <popgo+0x80>
Instruction A reads variable change_stack_pointer, instruction B writes
change_stack_pointer.
GCC generated code sequence is:
48.30 : 10010280: lwz r8,24(r9)
0.00 : 10010284: mr r7,r9
0.00 : 10010288: addi r9,r9,-16
0.63 : 1001028c: stw r8,0(r10)
0.00 : 10010290: ld r10,16(r9)
0.00 : 10010294: cmpdi cr7,r10,0
0.00 : 10010298: bne cr7,10010280 <popgo+0x90>
15.54 : 1001029c: nop
Note that variable change_stack_pointer is in register r9, it reads it at the
start of the function, and writes it after the loop. Since the address of
change_stack_pointer is never assigned to another variable, and it's a static
variable, so it can't be aliased with any other pointer, so it is safe to do
this optimization.
Even if I add -fstrict-aliasing explicitly to llvm command line, it can move
the read of change_stack_pointer out of the loop, but still contains write of
change_stack_pointer in the loop.
Command line options are:
-DSPEC_CPU -DNDEBUG -DHAVE_CONFIG_H -I. -I.. -I../include -I./include
-fno-strict-aliasing -O2 -m64 -mvsx -mcpu=power8 -DSPEC_CPU_LP64</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>