[PATCH] D139411: [AAch64] Optimize muls with operands having enough zero bits.
Martin Storsjö via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 21 01:47:48 PST 2022
mstorsjo added a comment.
The misoptimization occurs in this preprocessed file: https://martin.st/temp/lagarith-preproc.c
When built with `clang -target aarch64-linux-gnu -S lagarith-preproc.c -o out.s -O3`, the difference before/after this change looks like this:
--- good.s 2022-12-21 11:42:08.816749585 +0200
+++ bad.s 2022-12-21 11:43:20.672946992 +0200
@@ -1647,7 +1647,7 @@
lsr w14, w20, #1
mov x12, xzr
mov w9, wzr
- msub x11, x13, x20, x11
+ umsubl x11, w13, w20, x11
lsl x11, x11, x10
lsl x10, x13, x10
add x11, x11, x14
@@ -1661,9 +1661,9 @@
.LBB2_125: // %for.body66.i
// =>This Inner Loop Header: Depth=1
ldr w15, [x13, x12]
- mul x16, x10, x15
- mul x15, x11, x15
- add x15, x15, x16, lsr #32
+ umull x16, w10, w15
+ lsr x17, x16, #32
+ umaddl x15, w11, w15, x17
lsr x17, x15, #21
orr w17, w17, #0x1
clz w17, w17
@@ -1687,9 +1687,9 @@
.LBB2_128: // %for.body89.i
// =>This Inner Loop Header: Depth=1
ldr w15, [x13, x12]
- mul x16, x10, x15
- mul x15, x11, x15
- add x15, x15, x16, lsr #32
+ umull x16, w10, w15
+ lsr x17, x16, #32
+ umaddl x15, w11, w15, x17
lsr x17, x15, #21
orr w17, w17, #0x1
clz w17, w17
The full runtime case can be reproduced on Linux on aarch64 like this:
$ git clone git://source.ffmpeg.org/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --samples=$(pwd)/../samples --cc=clang
$ make fate-rsync # download test inputs
$ make -j$(nproc) fate-lagarith
The object file with the misoptimization is `libavcodec/lagarith.o` here.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D139411/new/
https://reviews.llvm.org/D139411
More information about the llvm-commits
mailing list