<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [X86] Not taking advantage of the permute-in-lane instruction (vpermilps)"
   href="https://bugs.llvm.org/show_bug.cgi?id=34382">34382</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[X86] Not taking advantage of the permute-in-lane instruction (vpermilps)
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>ayman.musa@intel.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>define <16 x float> @test_16xfloat_perm_mask0(<16 x float> %vec) {
  %res = shufflevector <16 x float> %vec, <16 x float> undef, <16 x i32> <i32
1, i32 1, i32 3, i32 0, i32 6, i32 4, i32 5, i32 7, i32 8, i32 8, i32 9, i32 9,
i32 15, i32 14, i32 14, i32 12>
  ret <16 x float> %res 
}

LLVM emits (showing 2.86 throughput on IACA tool):
    vmovaps .LCPI142_0(%rip), %zmm1 # zmm1 =
[1,1,3,0,6,4,5,7,8,8,9,9,15,14,14,12]
    vpermps %zmm0, %zmm1, %zmm0

While it could have emitted (showing 1.00 throughput on IACA tool):
    vpermilps .LCPI142_0(%rip), %zmm0, %zmm0

* LCPI142_0 holds the needed indexes for each permute instruction.
** Throughput results from IACA tool => lower is better.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>