<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Suboptimal codegen for llvm.vector.reduce of <N x i1>"

   href="https://bugs.llvm.org/show_bug.cgi?id=51122">51122</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Suboptimal codegen for llvm.vector.reduce of <N x i1>

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>12.0

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: AArch64

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>caleb.zulawski@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>arnaud.degrandmaison@arm.com, llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com

          </td>

        </tr></table>

      <p>

        <div>

        <pre>The binary reduction intrinsics on Aarch64 (and ARM) produce suboptimal

implementations over vectors of i1.  This issue is similar to

<a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED FIXED - sub-optimal codegen for llvm.experimental.vector.reduce of <N x i1>"

   href="show_bug.cgi?id=38840">https://bugs.llvm.org/show_bug.cgi?id=38840</a>.

declare i1 @llvm.vector.reduce.or.v16i1(<16 x i1> %a);

define i1 @mask_reduce_or(<16 x i8> %mask) {

    %mask1 = trunc <16 x i8> %mask to <8 x i1>

    %reduced = call i1 @llvm.vector.reduce.or.v16i1(<8 x i1> %mask1)

    ret i1 %reduced

}

produces

mask_reduce_or:                         // @mask_reduce_or

        umov    w14, v0.b[1]

        umov    w15, v0.b[0]

        umov    w13, v0.b[2]

        orr     w14, w15, w14

        umov    w12, v0.b[3]

        orr     w13, w14, w13

        umov    w11, v0.b[4]

        orr     w12, w13, w12

        umov    w10, v0.b[5]

        orr     w11, w12, w11

        umov    w9, v0.b[6]

        orr     w10, w11, w10

        umov    w8, v0.b[7]

        orr     w9, w10, w9

        orr     w8, w9, w8

        and     w0, w8, #0x1

        ret

when it could instead use vmaxvq (or vpmax on ARM).

The same goes for vector.reduce.and with vminvq (or vpmin on ARM).</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>