<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [AArch64] Vector store of scalars produces sub-optimal code"
href="https://bugs.llvm.org/show_bug.cgi?id=43460">43460</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[AArch64] Vector store of scalars produces sub-optimal code
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: AArch64
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>florian_hahn@apple.com
</td>
</tr>
<tr>
<th>CC</th>
<td>arnaud.degrandmaison@arm.com, llvm-bugs@lists.llvm.org, peter.smith@linaro.org, Ties.Stuij@arm.com
</td>
</tr></table>
<p>
<div>
<pre>It looks like we fail to generate optimal code when doing a vector store of a
vector of scalars. Consider the examples below (or on
<a href="https://godbolt.org/z/hpv4S9">https://godbolt.org/z/hpv4S9</a>) . I am not sure how common those cases actually
are, I just stumbled across this while looking at
<a href="http://lists.llvm.org/pipermail/llvm-dev/2019-September/135432.html">http://lists.llvm.org/pipermail/llvm-dev/2019-September/135432.html</a> .
define void @const_vec(<2 x i32>* %c) {
store <2 x i32> <i32 2, i32 3>, <2 x i32>* %c, align 16
ret void
}
define void @const_split(<4 x i32>* %c) {
entry:
%0 = getelementptr inbounds <4 x i32>, <4 x i32>* %c, i64 0, i64 0
store i32 1, i32* %0, align 4
%1 = getelementptr <4 x i32>, <4 x i32>* %c, i64 0, i64 1
store i32 2, i32* %1, align 4
ret void
}
With llc -O3 -mtriple=aarch64, we generate the assembly below. For the vector
version, we miss that we can use movk and instead load the constants from
memory.
.LCPI0_0:
.word 2 // 0x2
.word 3 // 0x3
const_vec: // @const_vec
adrp x8, .LCPI0_0
ldr d0, [x8, :lo12:.LCPI0_0]
str d0, [x0]
ret
const_split: // @const_split
mov x8, #1
movk x8, #2, lsl #32
str x8, [x0]
ret
For the case we store 2 arbitrary i32, we have an extra fmov and mov with the
vector version.
define void @var_vec_2(<2 x i32>* %c, i32 %a, i32 %b) {
%ins1 = insertelement <2 x i32> undef, i32 %a, i32 0
%ins2 = insertelement <2 x i32> %ins1, i32 %b, i32 1
store <2 x i32> %ins2, <2 x i32>* %c, align 16
ret void
}
define void @var_split(<4 x i32>* %c, i32 %a, i32 %b) {
entry:
%0 = getelementptr inbounds <4 x i32>, <4 x i32>* %c, i64 0, i64 0
store i32 %a, i32* %0, align 4
%1 = getelementptr <4 x i32>, <4 x i32>* %c, i64 0, i64 1
store i32 %b, i32* %1, align 4
ret void
}
var_vec_2: // @var_vec_2
fmov s0, w1
mov v0.s[1], w2
str d0, [x0]
ret
var_split: // @var_split
stp w1, w2, [x0]
ret</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>