[llvm-bugs] [Bug 31443] New: AVX-512 generates sub-optimal shuffles for byte vectors

Wed Dec 21 00:53:39 PST 2016

https://llvm.org/bugs/show_bug.cgi?id=31443

            Bug ID: 31443
           Summary: AVX-512 generates sub-optimal shuffles for byte
                    vectors
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: elena.demikhovsky at intel.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

The following sequence 
 %wide.vec = load <64 x i8>, <64 x i8>* %2, align 16, !tbaa !1
  %strided.vec = shufflevector <64 x i8> %wide.vec, <64 x i8> undef, <32 x i32>
<i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32
20, i32 22, i32 24, i32 26, i32 28, i32 30, i32 32, i32 34, i32 36, i32 38, i32
40, i32 42, i32 44, i32 46, i32 48, i32 50, i32 52, i32 54, i32 56, i32 58, i32
60, i32 62>
  %3 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %index
  %4 = bitcast i8* %3 to <32 x i8>*
  store <32 x i8> %strided.vec, <32 x i8>* %4, align 16, !tbaa !1

may be lowered as:
vpmovzxbw load 256 bits and expand to 512
vpmovzxbw load 256 bits and expand to 512
vpermt2w  shuffle for 2 <32 x i16> sources
vpmovwb   truncating store from <32 x i16> to <32 x i8>

In general, I recommend to use vpermt2w and vpermw instructions for <32 x i8>
shuffles on AVX-512 without VBMI feature.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161221/e1dc2a09/attachment.html>