<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - [NPM] Slower arm_mult_q15 code from failing to simplify min/max pattern"

   href="https://bugs.llvm.org/show_bug.cgi?id=48734">48734</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[NPM] Slower arm_mult_q15 code from failing to simplify min/max pattern

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Interprocedural Optimizations

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>david.green@arm.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>This code does a saturating multiply of 16bit fixed point values.

<a href="https://godbolt.org/z/9Eo1rz">https://godbolt.org/z/9Eo1rz</a>

It is roughly ~55% slower under the new pass manager (larger with q7 data

types). It appears that the code pattern under the old pass manager contains a

min/max pattern, and is nicely vectorized:

  %11 = load i16, i16* %pSrcA.addr.010, align 2, !tbaa !3

  %conv = sext i16 %11 to i32

  %12 = load i16, i16* %pSrcB.addr.08, align 2, !tbaa !3

  %conv2 = sext i16 %12 to i32

  %mul = mul nsw i32 %conv2, %conv

  %shr = ashr i32 %mul, 15

  %13 = icmp slt i32 %shr, 32767

  %spec.select.i = select i1 %13, i32 %shr, i32 32767

  %conv3 = trunc i32 %spec.select.i to i16

  store i16 %conv3, i16* %pDst.addr.09, align 2, !tbaa !3

Instead in the new has more expensive compare/select/trunc combo:

  %11 = load i16, i16* %pSrcA.addr.010, align 2, !tbaa !3

  %conv = sext i16 %11 to i32

  %12 = load i16, i16* %pSrcB.addr.08, align 2, !tbaa !3

  %conv2 = sext i16 %12 to i32

  %mul = mul nsw i32 %conv2, %conv

  %13 = lshr i32 %mul, 15

  %cmp4.i = icmp sgt i32 %mul, 1073741823

  %14 = trunc i32 %13 to i16

  %conv3 = select i1 %cmp4.i, i16 32767, i16 %14

  store i16 %conv3, i16* %pDst.addr.09, align 2, !tbaa !3

It appears that the function is differently optimized before it gets inlined?

It might also be possibly be fixed with a canonicalization fold:

<a href="https://alive2.llvm.org/ce/z/CwJcsD">https://alive2.llvm.org/ce/z/CwJcsD</a>

We seem to have a lot of regressions in other suites which may be more

difficult to reproduce for upstream, due to the nature of the benchmarks. We

will see what we can do.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>