<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [AArch64] missing lowering of memcpy"
href="https://bugs.llvm.org/show_bug.cgi?id=32384">32384</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[AArch64] missing lowering of memcpy
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: AArch64
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>davide@freebsd.org
</td>
</tr>
<tr>
<th>CC</th>
<td>efriedma@codeaurora.org, james.molloy@arm.com, llvm-bugs@lists.llvm.org, matze@braunis.de, t.p.northover@gmail.com
</td>
</tr></table>
<p>
<div>
<pre>Consider the following code, reduced from SingleSource/Misc/salsa20.c in the
llvm test-suite, built with -O3 -mgeneral-regs-only -fomit-frame-pointer.
#include <stdint.h>
void tinky(uint32_t out[16],uint32_t in[16]) {
uint32_t x[16];
for (int i = 0; i < 16; ++i) x[i] = in[i];
for (int i = 0; i < 16; ++i) out[i] = x[i] + in[i];
}
llvm generates a call to memcpy to put the input array on the stack
0000000000000000 <tinky>:
0: d10183ff sub sp, sp, #0x60
4: a9044ff4 stp x20, x19, [sp,#64]
8: aa0003f3 mov x19, x0
c: 910003e0 mov x0, sp
10: 321a03e2 orr w2, wzr, #0x40
14: a9057bfd stp x29, x30, [sp,#80]
18: 910143fd add x29, sp, #0x50
1c: aa0103f4 mov x20, x1
20: 94000000 bl 0 <memcpy>
24: b94003e8 ldr w8, [sp]
28: b9400289 ldr w9, [x20]
2c: 0b080128 add w8, w9, w8
[...]
while GCC just used `stp`:
0: d10103ff sub sp, sp, #0x40
4: 91004024 add x4, x1, #0x10
8: eb04001f cmp x0, x4
c: 91004003 add x3, x0, #0x10
10: a9402c2a ldp x10, x11, [x1]
14: aa000022 orr x2, x1, x0
18: a9412428 ldp x8, x9, [x1,#16]
1c: a9002fea stp x10, x11, [sp]
20: a9421c26 ldp x6, x7, [x1,#32]
24: a90127e8 stp x8, x9, [sp,#16]
28: a9431424 ldp x4, x5, [x1,#48]
2c: a9021fe6 stp x6, x7, [sp,#32]
30: a90317e4 stp x4, x5, [sp,#48]
34: 92400c42 and x2, x2, #0xf
38: fa433022 ccmp x1, x3, #0x2, cc
As the IR shows, both alignment and size are constant, and the size is
relatively small (well, 64), so I guess this call could be lowered (providing
`EmitTargetCodeForMemcpy()`) for the architecture. Thoughts?
define void @tinky(i32* nocapture, i32* nocapture readonly) local_unnamed_addr
#0 {
%3 = bitcast i32* %1 to i8*
%4 = alloca [16 x i32], align 4
%5 = bitcast [16 x i32]* %4 to i8*
call void @llvm.lifetime.start(i64 64, i8* nonnull %5) #2
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull %5, i8* %3, i64 64, i32 4,
i1 false)
%6 = getelementptr inbounds [16 x i32], [16 x i32]* %4, i64 0, i64 0
%7 = load i32, i32* %6, align 4, !tbaa !1
%8 = load i32, i32* %1, align 4, !tbaa !1
%9 = add i32 %8, %7</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>