[PATCH] D132322: [AArch64][SelectionDAG] Optimize multiplication by constant
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 22 18:23:13 PDT 2022
efriedma added a comment.
It's not obvious that the replacement sequences are consistently faster. At least on some cores, "add x8, x8, w0, sxtw #1" and "smull x0, w0, w8" have exactly the same throughput, so transforming from the smull to a two-instruction sequence involving the add isn't really profitable.
On a related note, many cores have optimizations for arithmetic with lsl #n, so we should prefer that over uxtw/sxtw.
================
Comment at: llvm/test/CodeGen/AArch64/mul_pow2.ll:183
+; TODO: mov w8, w0 + lsl x8, x8, #2 should combine into lsl x8, x0, #2
define i64 @test6_umaddl(i32 %x, i64 %y) {
----------------
I think the suggestion misses a zero-extension. Should be able to ubfiz, though.
================
Comment at: llvm/test/CodeGen/AArch64/mul_pow2.ll:296
+; CHECK-NEXT: add x8, x8, w0, sxtw #1
+; CHECK-NEXT: neg x0, x8
; CHECK-NEXT: ret
----------------
I think you can save an instruction here: instead of "-(x*4+x*2)", compute x*2-x*8.
================
Comment at: llvm/test/CodeGen/AArch64/mul_pow2.ll:518
+; CHECK-NEXT: add w8, w0, w0, lsl #1
+; CHECK-NEXT: neg w0, w8
; CHECK-NEXT: ret
----------------
This seems to be overriding our existing logic here to produce a worse result.
================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-counting-elems-i32.ll:169
+; CHECK-NEXT: dech x8, vl16, mul #8
+; CHECK-NEXT: add w0, w0, w8
; CHECK-NEXT: ret
----------------
Probably need some logic to allow folding inch/dech, assuming there isn't some reason to avoid them.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D132322/new/
https://reviews.llvm.org/D132322
More information about the llvm-commits
mailing list