[PATCH] D92296: [AARCH64] Improve accumulator forwarding for Cortex-A57 model

Mon Dec 21 04:35:59 PST 2020

evgeny777 added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA57.td:379
 // ASIMD multiply, D-form
-def : InstRW<[A57Write_5cyc_1W], (instregex "^(P?MUL|SQR?DMULH)(v8i8|v4i16|v2i32|v1i8|v1i16|v1i32|v1i64)(_indexed)?$")>;
+def : InstRW<[A57Write_5cyc_1W_Mul_Forward], (instregex "^(P?MUL|SQR?DMULH)(v8i8|v4i16|v2i32|v1i8|v1i16|v1i32|v1i64)(_indexed)?$")>;
 // ASIMD multiply, Q-form
----------------
dmgreen wrote:
> Do PMUL and sqdmulh have this forwarding? Same for other instructions like SQDMLAL below.
@mnadeem It looks like they don't (at least PMUL). I've done some experiments with llvm-exegesis with following results:

1. Latency of PMUL is 4 cycles, not 5 cycles
2. There is always 4 cyc latency for PMUL result forwarded to MLA/MLS accumulator

I've used Jetson Nano board for testing

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92296/new/

https://reviews.llvm.org/D92296