<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - Performance regression with r271410"
href="https://llvm.org/bugs/show_bug.cgi?id=27988">27988</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Performance regression with r271410
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Loop Optimizer
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>arnaud.degrandmaison@arm.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=16458" name="attach_16458" title="Reproducer">attachment 16458</a> <a href="attachment.cgi?id=16458&action=edit" title="Reproducer">[details]</a></span>
Reproducer
Commit r271410 significantly regresses some common industry benchmarks, at
least for the ARM platforms, but the reproducer below shows also that codegen
for x86 does not look better at all.
With the following testcase:
<span class="quote">> cat test.c</span >
#define LEN 10000
#define ALIGNMENT 16
__attribute__((aligned(ALIGNMENT))) float a[LEN];
__attribute__((aligned(ALIGNMENT))) float b[LEN];
extern int dummy(float *, float *);
int s173() {
int k = LEN / 2;
for (int i = 0; i < LEN / 2; i++) {
a[i + k] = a[i] + b[i];
}
return dummy(a, b);
}
On the AArch64 target, we get:
(with r271410)
<span class="quote">> clang -target arm64-linux-gnu -O2 -S -o - test.c</span >
...
.LBB0_1:
fmov x11, d2
lsl x11, x11, #2
add x12, x9, x11
add v3.2d, v2.2d, v1.2d
ldr q2, [x12]
ldr q4, [x10, x11]
add v0.2d, v0.2d, v1.2d
sub x8, x8, #4
fadd v2.4s, v2.4s, v4.4s
str q2, [x12, #20000]
mov v2.16b, v3.16b
cbnz x8, .LBB0_1
...
(with r271410 reverted)
<span class="quote">> clang -target arm64-linux-gnu -O2 -S -o - test.c</span >
...
.LBB0_1:
add x11, x9, x8
add x12, x10, x8
ldr q0, [x11, #20000]
ldr q1, [x12, #20000]
add x8, x8, #16
fadd v0.4s, v0.4s, v1.4s
str q0, [x11, #40000]
cbnz x8, .LBB0_1
...
It seems the generate code is also regressed on x86:
(with r271410)
<span class="quote">> clang -O2 -S -o - test.c</span >
...
.LBB0_1:
movdqa %xmm0, %xmm4
paddq %xmm2, %xmm4
movd %xmm0, %rcx
movups a(,%rcx,4), %xmm5
movups b(,%rcx,4), %xmm6
addps %xmm5, %xmm6
movups %xmm6, a+20000(,%rcx,4)
paddq %xmm3, %xmm0
paddq %xmm3, %xmm1
movd %xmm4, %rcx
movups a(,%rcx,4), %xmm4
movups b(,%rcx,4), %xmm5
addps %xmm4, %xmm5
movups %xmm5, a+20000(,%rcx,4)
addq $-8, %rax
jne .LBB0_1
...
(with r271410 reverted)
<span class="quote">> clang -O2 -S -o - test.c</span >
...
.LBB0_1:
movaps a+20000(%rax), %xmm0
addps b+20000(%rax), %xmm0
movaps %xmm0, a+40000(%rax)
movaps a+20016(%rax), %xmm0
addps b+20016(%rax), %xmm0
movaps %xmm0, a+40016(%rax)
addq $32, %rax
jne .LBB0_1
...</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>