[PATCH] D123512: [MachineCombiner]: Avoid including transient instructions in latency calculation
George Steed via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 11 09:29:25 PDT 2022
georges added inline comments.
================
Comment at: llvm/test/CodeGen/AArch64/neon-mla-mls.ll:141
; CHECK: // %bb.0:
-; CHECK-NEXT: mul v0.8b, v0.8b, v1.8b
-; CHECK-NEXT: sub v0.8b, v0.8b, v2.8b
+; CHECK-NEXT: neg v2.8b, v2.8b
+; CHECK-NEXT: mla v2.8b, v0.8b, v1.8b
----------------
fhahn wrote:
> Is this actually profitable compared to the original code? Shouldn't this use `mls`?
I don't think you can use `mls` here since the negation is on the input accumulator rather than the multiplicand.
Apart from that, I agree with @fhahn that this doesn't look obviously profitable. `fmov`/`mov` are not zero-latency on a lot of micro-architectures.
The equivalent case without the `fmov` (e.g. accumulating into `A` rather than `C` so it ends up in `v0` naturally) could be believably faster due to the late accumulator operand forwarding present on many micro-architectures, and at worst shouldn't be worse than the existing code. I'm not sure if that already happened prior to this change though.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123512/new/
https://reviews.llvm.org/D123512
More information about the llvm-commits
mailing list