<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - [X86] Widening of elements in shuffle lowering can prevent masking operation from being combined into the final instruction."
   href="https://llvm.org/bugs/show_bug.cgi?id=31018">31018</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[X86] Widening of elements in shuffle lowering can prevent masking operation from being combined into the final instruction.
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>craig.topper@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>One of the first things shuffle lowering does is widen elements if the shuffle
elements are adjacent. This is great for choosing the best shuffle given the
limited availability of shuffles for smaller element sizes. But it loses the
original type information.

Some shuffles can be implemented equally well with different element sizes. And
sometimes the element size and consequently the number of elements is important
for enabling AVX-512 masking operations to be folded into the final instruction

Fox example the following sequence could be better implemented with a PALIGND
which would allow the masking to be folded.

define <16 x i32>
@mask_shuffle_v16i32_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00_01(<16 x i32>
%a, <16 x i32> %passthru, i16 %mask) {
; ALL-LABEL:
mask_shuffle_v16i32_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00_01:
; ALL:       # BB#0:
; ALL-NEXT:    valignq {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7,0]
; ALL-NEXT:    kmovw %edi, %k1
; ALL-NEXT:    vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; ALL-NEXT:    retq
  %shuffle = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32><i32 2,
i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32
13, i32 14, i32 15, i32 0, i32 1>
  %mask.cast = bitcast i16 %mask to <16 x i1>
  %res = select <16 x i1> %mask.cast, <16 x i32> %shuffle, <16 x i32> %passthru
  ret <16 x i32> %res
}


We should add a DAGCombine after shuffle lowering that can detect a select
being fed by a bitcast from a shuffle that can be implemented differently to
remove the bitcast.

Example shuffles that this could apply to:
VPALIGNQ->VPALIGND
128-bit PALIGNR->VPALIGND/VPALIGNQ
SHUFF64x2->SHUFF32x4</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>