<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - failure to recognize byte unpack shuffles without ssse3 available"
   href="https://llvm.org/bugs/show_bug.cgi?id=31151">31151</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>failure to recognize byte unpack shuffles without ssse3 available
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>sroland@vmware.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Sometimes llvm fails to match even pretty trivial shuffles if pshufb isn't
available (at least I think that's the reason, it works with ssse3 but not
without), even if the result wouldn't use pshufb anyway.

This code (note that both shuffles are trivially doable with just sse2) works
fine with ssse3 and higher but not without (well, the result is correct at
least, but the anti-optimization is obvious and quite serious).

define <4 x i32> @unpackbwpshufd(<16 x i8> %val1, <16 x i8> %val2) {
entry:
   %0 = shufflevector <16 x i8> %val1, <16 x i8> %val2, <16 x i32> <i32 0, i32
16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21,
i32 6, i32 22, i32 7, i32 23>
   %1 = bitcast <16 x i8> %0 to <4 x i32>
   %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4  x i32> <i32 0, i32 2,
i32 1, i32 3>
   ret <4 x i32> %2
}

With -mattr=ssse3 the result is:
        punpcklbw       %xmm1, %xmm0    # xmm0 =
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
        pshufd  $216, %xmm0, %xmm0      # xmm0 = xmm0[0,2,1,3]
        retq

But without it, llvm emits this complicated mess, despite that obviously the
above code would work just fine with sse2:
        punpcklbw       %xmm1, %xmm1    # xmm1 =
xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
        pshufd  $216, %xmm1, %xmm2      # xmm2 = xmm1[0,2,1,3]
        movdqa  .LCPI0_0(%rip), %xmm1   # xmm1 =
[255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
        punpcklbw       %xmm0, %xmm0    # xmm0 =
xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
        pshufd  $216, %xmm0, %xmm0      # xmm0 = xmm0[0,2,1,3]
        pand    %xmm1, %xmm0
        pandn   %xmm2, %xmm1
        por     %xmm0, %xmm1
        movdqa  %xmm1, %xmm0
        retq

Note that without the second shuffle (so just return %1 above) llvm emits the
punpcklbw just fine, without resorting to stitched together shuffles and
masks...

It actually worked at some point (it works with llvm 3.3 and fails with 3.7 and
newer), I suppose though that was back when llvm didn't have much of a shuffle
optimizer.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>