<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Suboptimal codegen for llvm.vector.reduce of <N x i1>"
   href="https://bugs.llvm.org/show_bug.cgi?id=51122">51122</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Suboptimal codegen for llvm.vector.reduce of <N x i1>
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>12.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: AArch64
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>caleb.zulawski@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>arnaud.degrandmaison@arm.com, llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>The binary reduction intrinsics on Aarch64 (and ARM) produce suboptimal
implementations over vectors of i1.  This issue is similar to
<a class="bz_bug_link 
          bz_status_RESOLVED  bz_closed"
   title="RESOLVED FIXED - sub-optimal codegen for llvm.experimental.vector.reduce of <N x i1>"
   href="show_bug.cgi?id=38840">https://bugs.llvm.org/show_bug.cgi?id=38840</a>.

declare i1 @llvm.vector.reduce.or.v16i1(<16 x i1> %a);

define i1 @mask_reduce_or(<16 x i8> %mask) {
    %mask1 = trunc <16 x i8> %mask to <8 x i1>
    %reduced = call i1 @llvm.vector.reduce.or.v16i1(<8 x i1> %mask1)
    ret i1 %reduced
}

produces

mask_reduce_or:                         // @mask_reduce_or
        umov    w14, v0.b[1]
        umov    w15, v0.b[0]
        umov    w13, v0.b[2]
        orr     w14, w15, w14
        umov    w12, v0.b[3]
        orr     w13, w14, w13
        umov    w11, v0.b[4]
        orr     w12, w13, w12
        umov    w10, v0.b[5]
        orr     w11, w12, w11
        umov    w9, v0.b[6]
        orr     w10, w11, w10
        umov    w8, v0.b[7]
        orr     w9, w10, w9
        orr     w8, w9, w8
        and     w0, w8, #0x1
        ret

when it could instead use vmaxvq (or vpmax on ARM).

The same goes for vector.reduce.and with vminvq (or vpmin on ARM).</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>