<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - AVX-512 generates sub-optimal shuffles for byte vectors"
   href="https://llvm.org/bugs/show_bug.cgi?id=31443">31443</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>AVX-512 generates sub-optimal shuffles for byte vectors
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>elena.demikhovsky@intel.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>The following sequence 
 %wide.vec = load <64 x i8>, <64 x i8>* %2, align 16, !tbaa !1
  %strided.vec = shufflevector <64 x i8> %wide.vec, <64 x i8> undef, <32 x i32>
<i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32
20, i32 22, i32 24, i32 26, i32 28, i32 30, i32 32, i32 34, i32 36, i32 38, i32
40, i32 42, i32 44, i32 46, i32 48, i32 50, i32 52, i32 54, i32 56, i32 58, i32
60, i32 62>
  %3 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %index
  %4 = bitcast i8* %3 to <32 x i8>*
  store <32 x i8> %strided.vec, <32 x i8>* %4, align 16, !tbaa !1

may be lowered as:
vpmovzxbw load 256 bits and expand to 512
vpmovzxbw load 256 bits and expand to 512
vpermt2w  shuffle for 2 <32 x i16> sources
vpmovwb   truncating store from <32 x i16> to <32 x i8>

In general, I recommend to use vpermt2w and vpermw instructions for <32 x i8>
shuffles on AVX-512 without VBMI feature.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>