[llvm-bugs] [Bug 43495] New: [AARCH64] Clang should use xtn/shrn for shuffles
via llvm-bugs
llvm-bugs at lists.llvm.org
Sat Sep 28 21:34:28 PDT 2019
https://bugs.llvm.org/show_bug.cgi?id=43495
Bug ID: 43495
Summary: [AARCH64] Clang should use xtn/shrn for shuffles
Product: libraries
Version: 9.0
Hardware: Other
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: AArch64
Assignee: unassignedbugs at nondot.org
Reporter: husseydevin at gmail.com
CC: arnaud.degrandmaison at arm.com,
llvm-bugs at lists.llvm.org, peter.smith at linaro.org,
Ties.Stuij at arm.com
On aarch64, shuffles suck.
Not only do you need to split up the instructions and use ext more, but they
are so much slower than on ARMv7a.
uint32x2_t get_even_lanes(uint32x4_t x)
{
return __builtin_shufflevector(x, x, 0, 2);
}
uint32x2_t get_odd_lanes(uint32x4_t x)
{
return __builtin_shufflevector(x, x, 1, 3);
}
uint32x2x2_t vzip_pairwise(uint32x4_t x)
{
return vzip_u32(vget_low_u32(x), vget_high_u32(x));
}
Clang-9 emits this:
get_even_lanes:
ext v1.16b, v0.16b, v0.16b, #8
zip1 v0.2s, v0.2s, v1.2s
ret
get_odd_lanes:
ext v1.16b, v0.16b, v0.16b, #8
zip1 v0.2s, v0.2s, v1.2s
ret
vzip_pairwise:
ext v1.16b, v0.16b, v0.16b, #8
zip1 v2.2s, v0.2s, v1.2s
zip2 v1.2s, v0.2s, v1.2s
mov v0.16b, v2.16b
ret
This is garbage.
It significantly better to do this instead:
get_even_lanes:
xtn v0.2s, v0.2d
ret
get_odd_lanes:
shrn v0.2s, v0.2d, #32
ret
vzip_pairwise:
shrn v1.2s, v0.2d, #32
xtn v0.2s, v0.2d
ret
Side note: on 32-bit, if we kill the source, vshrn+vmovn and vrevNq+vuzpq (and
only using onne result) both lose a vzip in place, and is the best 0213 shuffle
aside from vld2.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190929/e440385f/attachment.html>
More information about the llvm-bugs
mailing list