[llvm-bugs] [Bug 48734] New: [NPM] Slower arm_mult_q15 code from failing to simplify min/max pattern

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Jan 13 02:12:54 PST 2021


https://bugs.llvm.org/show_bug.cgi?id=48734

            Bug ID: 48734
           Summary: [NPM] Slower arm_mult_q15 code from failing to
                    simplify min/max pattern
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Interprocedural Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: david.green at arm.com
                CC: llvm-bugs at lists.llvm.org

This code does a saturating multiply of 16bit fixed point values.
https://godbolt.org/z/9Eo1rz

It is roughly ~55% slower under the new pass manager (larger with q7 data
types). It appears that the code pattern under the old pass manager contains a
min/max pattern, and is nicely vectorized:

  %11 = load i16, i16* %pSrcA.addr.010, align 2, !tbaa !3
  %conv = sext i16 %11 to i32
  %12 = load i16, i16* %pSrcB.addr.08, align 2, !tbaa !3
  %conv2 = sext i16 %12 to i32
  %mul = mul nsw i32 %conv2, %conv
  %shr = ashr i32 %mul, 15
  %13 = icmp slt i32 %shr, 32767
  %spec.select.i = select i1 %13, i32 %shr, i32 32767
  %conv3 = trunc i32 %spec.select.i to i16
  store i16 %conv3, i16* %pDst.addr.09, align 2, !tbaa !3

Instead in the new has more expensive compare/select/trunc combo:

  %11 = load i16, i16* %pSrcA.addr.010, align 2, !tbaa !3
  %conv = sext i16 %11 to i32
  %12 = load i16, i16* %pSrcB.addr.08, align 2, !tbaa !3
  %conv2 = sext i16 %12 to i32
  %mul = mul nsw i32 %conv2, %conv
  %13 = lshr i32 %mul, 15
  %cmp4.i = icmp sgt i32 %mul, 1073741823
  %14 = trunc i32 %13 to i16
  %conv3 = select i1 %cmp4.i, i16 32767, i16 %14
  store i16 %conv3, i16* %pDst.addr.09, align 2, !tbaa !3

It appears that the function is differently optimized before it gets inlined?
It might also be possibly be fixed with a canonicalization fold:
https://alive2.llvm.org/ce/z/CwJcsD

We seem to have a lot of regressions in other suites which may be more
difficult to reproduce for upstream, due to the nature of the benchmarks. We
will see what we can do.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210113/d2d9a966/attachment.html>


More information about the llvm-bugs mailing list