[PATCH] D117003: [SchedModels][CortexA55] Add ASIMD integer instructioins

Mon Feb 7 22:36:42 PST 2022

kpdev42 added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA55.td:494
+// COPY
+def : InstRW<[CortexA55WriteCOPY], (instrs COPY)>;
 }
----------------
dmgreen wrote:
> Does this add a lot? It's not really how COPYs work.
According to our experiments FPU copy (fmov) has latency of 1 cycle and throughput of 2 or 1 (Q-form). According to model integer ALU copy has 3 cycle latency. What would be correct model for COPY in your opinion?

================
Comment at: llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s:2506
 # CHECK-NEXT:  -      -      -      -      -      -      -      -      -     2.00    -      -     ld4r	{ v0.2s, v1.2s, v2.2s, v3.2s }, [sp], x30
-# CHECK-NEXT:  -      -      -      -     0.50   0.50    -      -      -      -      -      -     mla	v0.8b, v0.8b, v0.8b
-# CHECK-NEXT:  -      -      -      -     0.50   0.50    -      -      -      -      -      -     mls	v0.4h, v0.4h, v0.4h
+# CHECK-NEXT:  -      -      -      -      -      -      -     0.50   0.50    -      -      -     mla	v0.8b, v0.8b, v0.8b
+# CHECK-NEXT:  -      -      -      -      -      -      -     0.50   0.50    -      -      -     mls	v0.4h, v0.4h, v0.4h
----------------
dmgreen wrote:
> What is the reasoning for the integer multiplies going down the FPMAC pipeline?
I guess mla/mls (ASIMD multiply/accumulate) utilize NEON pipeline. For some reason 2 NEON pipelines of Cortex-A55 are modelled with 5 pipelines (2 x FPALU, 2 x FPMAC, 1 x FPDIV). What you think would be correct resource assignment for mla/mls?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117003/new/

https://reviews.llvm.org/D117003