[llvm-bugs] [Bug 31151] New: failure to recognize byte unpack shuffles without ssse3 available

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Nov 23 21:13:26 PST 2016


https://llvm.org/bugs/show_bug.cgi?id=31151

            Bug ID: 31151
           Summary: failure to recognize byte unpack shuffles without
                    ssse3 available
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: sroland at vmware.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

Sometimes llvm fails to match even pretty trivial shuffles if pshufb isn't
available (at least I think that's the reason, it works with ssse3 but not
without), even if the result wouldn't use pshufb anyway.

This code (note that both shuffles are trivially doable with just sse2) works
fine with ssse3 and higher but not without (well, the result is correct at
least, but the anti-optimization is obvious and quite serious).

define <4 x i32> @unpackbwpshufd(<16 x i8> %val1, <16 x i8> %val2) {
entry:
   %0 = shufflevector <16 x i8> %val1, <16 x i8> %val2, <16 x i32> <i32 0, i32
16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21,
i32 6, i32 22, i32 7, i32 23>
   %1 = bitcast <16 x i8> %0 to <4 x i32>
   %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4  x i32> <i32 0, i32 2,
i32 1, i32 3>
   ret <4 x i32> %2
}

With -mattr=ssse3 the result is:
        punpcklbw       %xmm1, %xmm0    # xmm0 =
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
        pshufd  $216, %xmm0, %xmm0      # xmm0 = xmm0[0,2,1,3]
        retq

But without it, llvm emits this complicated mess, despite that obviously the
above code would work just fine with sse2:
        punpcklbw       %xmm1, %xmm1    # xmm1 =
xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
        pshufd  $216, %xmm1, %xmm2      # xmm2 = xmm1[0,2,1,3]
        movdqa  .LCPI0_0(%rip), %xmm1   # xmm1 =
[255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
        punpcklbw       %xmm0, %xmm0    # xmm0 =
xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
        pshufd  $216, %xmm0, %xmm0      # xmm0 = xmm0[0,2,1,3]
        pand    %xmm1, %xmm0
        pandn   %xmm2, %xmm1
        por     %xmm0, %xmm1
        movdqa  %xmm1, %xmm0
        retq

Note that without the second shuffle (so just return %1 above) llvm emits the
punpcklbw just fine, without resorting to stitched together shuffles and
masks...

It actually worked at some point (it works with llvm 3.3 and fails with 3.7 and
newer), I suppose though that was back when llvm didn't have much of a shuffle
optimizer.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161124/c6d93179/attachment.html>


More information about the llvm-bugs mailing list