[PATCH] D92296: [AARCH64] Improve accumulator forwarding for Cortex-A57 model
Eugene Leviant via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 21 04:35:59 PST 2020
evgeny777 added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA57.td:379
// ASIMD multiply, D-form
-def : InstRW<[A57Write_5cyc_1W], (instregex "^(P?MUL|SQR?DMULH)(v8i8|v4i16|v2i32|v1i8|v1i16|v1i32|v1i64)(_indexed)?$")>;
+def : InstRW<[A57Write_5cyc_1W_Mul_Forward], (instregex "^(P?MUL|SQR?DMULH)(v8i8|v4i16|v2i32|v1i8|v1i16|v1i32|v1i64)(_indexed)?$")>;
// ASIMD multiply, Q-form
----------------
dmgreen wrote:
> Do PMUL and sqdmulh have this forwarding? Same for other instructions like SQDMLAL below.
@mnadeem It looks like they don't (at least PMUL). I've done some experiments with llvm-exegesis with following results:
1. Latency of PMUL is 4 cycles, not 5 cycles
2. There is always 4 cyc latency for PMUL result forwarded to MLA/MLS accumulator
I've used Jetson Nano board for testing
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D92296/new/
https://reviews.llvm.org/D92296
More information about the llvm-commits
mailing list