[llvm-bugs] [Bug 31443] New: AVX-512 generates sub-optimal shuffles for byte vectors
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Dec 21 00:53:39 PST 2016
https://llvm.org/bugs/show_bug.cgi?id=31443
Bug ID: 31443
Summary: AVX-512 generates sub-optimal shuffles for byte
vectors
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: elena.demikhovsky at intel.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
The following sequence
%wide.vec = load <64 x i8>, <64 x i8>* %2, align 16, !tbaa !1
%strided.vec = shufflevector <64 x i8> %wide.vec, <64 x i8> undef, <32 x i32>
<i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32
20, i32 22, i32 24, i32 26, i32 28, i32 30, i32 32, i32 34, i32 36, i32 38, i32
40, i32 42, i32 44, i32 46, i32 48, i32 50, i32 52, i32 54, i32 56, i32 58, i32
60, i32 62>
%3 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %index
%4 = bitcast i8* %3 to <32 x i8>*
store <32 x i8> %strided.vec, <32 x i8>* %4, align 16, !tbaa !1
may be lowered as:
vpmovzxbw load 256 bits and expand to 512
vpmovzxbw load 256 bits and expand to 512
vpermt2w shuffle for 2 <32 x i16> sources
vpmovwb truncating store from <32 x i16> to <32 x i8>
In general, I recommend to use vpermt2w and vpermw instructions for <32 x i8>
shuffles on AVX-512 without VBMI feature.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161221/e1dc2a09/attachment.html>
More information about the llvm-bugs
mailing list