<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - [X86] Widening of elements in shuffle lowering can prevent masking operation from being combined into the final instruction."

   href="https://llvm.org/bugs/show_bug.cgi?id=31018">31018</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[X86] Widening of elements in shuffle lowering can prevent masking operation from being combined into the final instruction.

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>craig.topper@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>One of the first things shuffle lowering does is widen elements if the shuffle

elements are adjacent. This is great for choosing the best shuffle given the

limited availability of shuffles for smaller element sizes. But it loses the

original type information.

Some shuffles can be implemented equally well with different element sizes. And

sometimes the element size and consequently the number of elements is important

for enabling AVX-512 masking operations to be folded into the final instruction

Fox example the following sequence could be better implemented with a PALIGND

which would allow the masking to be folded.

define <16 x i32>

@mask_shuffle_v16i32_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00_01(<16 x i32>

%a, <16 x i32> %passthru, i16 %mask) {

; ALL-LABEL:

mask_shuffle_v16i32_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00_01:

; ALL:       # BB#0:

; ALL-NEXT:    valignq {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7,0]

; ALL-NEXT:    kmovw %edi, %k1

; ALL-NEXT:    vpblendmd %zmm0, %zmm1, %zmm0 {%k1}

; ALL-NEXT:    retq

  %shuffle = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32><i32 2,

i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32

13, i32 14, i32 15, i32 0, i32 1>

  %mask.cast = bitcast i16 %mask to <16 x i1>

  %res = select <16 x i1> %mask.cast, <16 x i32> %shuffle, <16 x i32> %passthru

  ret <16 x i32> %res

}

We should add a DAGCombine after shuffle lowering that can detect a select

being fed by a bitcast from a shuffle that can be implemented differently to

remove the bitcast.

Example shuffles that this could apply to:

VPALIGNQ->VPALIGND

128-bit PALIGNR->VPALIGND/VPALIGNQ

SHUFF64x2->SHUFF32x4</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>