<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [X86][AVX512] Use of different granularity broadcast prevents combining the mask inside the instruction."
   href="https://bugs.llvm.org/show_bug.cgi?id=34357">34357</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[X86][AVX512] Use of different granularity broadcast prevents combining the mask inside the instruction.
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>ayman.musa@intel.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>vector broadcasts of type <2 x float> or <2 x i32> don't select the
vbroadcastf32x2 and vbroadcasti32x2 instructions. Instead they select
vbroadcastsd and vpbroadcastq (respectively).

This prevents the mask (if exists) to be combined inside the broadcast
instruction (because of the different granularity between the mask and selected
instruction), which results in an extra blend or mov instruction.

Reproducer:

define <8 x float> @test_masked_z_2xfloat_to_8xfloat_mask1(<8 x float> %vec) {
   %shuf = shufflevector <8 x float> %vec, <8 x float> undef, <8 x i32> <i32 0,
i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
   %res = select <8 x i1> <i1 0, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 0>, <8
x float> %shuf, <8 x float> zeroinitializer
   ret <8 x float> %res 
 }

<span class="quote">>> llc -mcpu=skx <file-name> -o out.s</span >

LLVM emits:
  vbroadcastsd %xmm0, %ymm0
  movb $126, %al
  kmovd %eax, %k1
  vmovaps %ymm0, %ymm0 {%k1} {z}
  retq

While it can be replaced with:
  movb $126, %al
  kmovd %eax, %k1
  vbroadcastf32x2 %xmm0, %ymm0 {%k1} {z}</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>