[PATCH] D139411: [AAch64] Optimize muls with operands having enough zero bits.

Martin Storsjö via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 21 01:47:48 PST 2022


mstorsjo added a comment.

The misoptimization occurs in this preprocessed file: https://martin.st/temp/lagarith-preproc.c

When built with `clang -target aarch64-linux-gnu -S lagarith-preproc.c -o out.s -O3`, the difference before/after this change looks like this:

  --- good.s      2022-12-21 11:42:08.816749585 +0200
  +++ bad.s       2022-12-21 11:43:20.672946992 +0200
  @@ -1647,7 +1647,7 @@
          lsr     w14, w20, #1
          mov     x12, xzr
          mov     w9, wzr
  -       msub    x11, x13, x20, x11
  +       umsubl  x11, w13, w20, x11
          lsl     x11, x11, x10
          lsl     x10, x13, x10
          add     x11, x11, x14
  @@ -1661,9 +1661,9 @@
   .LBB2_125:                              // %for.body66.i
                                           // =>This Inner Loop Header: Depth=1
          ldr     w15, [x13, x12]
  -       mul     x16, x10, x15
  -       mul     x15, x11, x15
  -       add     x15, x15, x16, lsr #32
  +       umull   x16, w10, w15
  +       lsr     x17, x16, #32
  +       umaddl  x15, w11, w15, x17
          lsr     x17, x15, #21
          orr     w17, w17, #0x1
          clz     w17, w17
  @@ -1687,9 +1687,9 @@
   .LBB2_128:                              // %for.body89.i
                                           // =>This Inner Loop Header: Depth=1
          ldr     w15, [x13, x12]
  -       mul     x16, x10, x15
  -       mul     x15, x11, x15
  -       add     x15, x15, x16, lsr #32
  +       umull   x16, w10, w15
  +       lsr     x17, x16, #32
  +       umaddl  x15, w11, w15, x17
          lsr     x17, x15, #21
          orr     w17, w17, #0x1
          clz     w17, w17

The full runtime case can be reproduced on Linux on aarch64 like this:

  $ git clone git://source.ffmpeg.org/ffmpeg
  $ mkdir ffmpeg-build
  $ cd ffmpeg-build
  $ ../ffmpeg/configure --samples=$(pwd)/../samples --cc=clang
  $ make fate-rsync # download test inputs
  $ make -j$(nproc) fate-lagarith

The object file with the misoptimization is `libavcodec/lagarith.o` here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139411/new/

https://reviews.llvm.org/D139411



More information about the llvm-commits mailing list