[PATCH] D132185: [TTI][AArch64] Update vector extract cost for Neoverse-N1.

Fri Aug 19 02:25:35 PDT 2022

fhahn added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64Subtarget.cpp:191
     MaxBytesForLoopAlignment = 16;
+    VectorInsertExtractBaseCost = 1;
     break;
----------------
vporpo wrote:
> mingmingl wrote:
> > nittest nit: this changes cost for both extract and insert, while summary mostly mentions EXT instruction cost. Might be good to call out that INS has a latency of 2 and throughput of 2 (unless it's common assumption that extract and insert instruction have the same cost).
> > 
> > Also, from the studies of D128302, I think the cost of extract/insert is better modeled by considering user instruction into account (e.g., if user instruction can access lane directly, extract could be combined into user in emitted code and have no cost). Nevertheless, my gut feeling is that 3 is a high number (for instructions of latency 2 and throughput 2); not sure if 1 is too small.
> Yes, I need to update the description and add a test for the insertelement instructions too.
> 
> Yeah considering the user instruction is definitely more precise.
> 
> I think that a cost of 1 may be all right as long as only 1 instruction is needed for the extraction. I think this is the logic in the cost calculation of the extracts in x86: it returns 1 if only 1 instruction is needed, or a higher cost if more instructions are needed.
I'd expect the extract cost to be similar on most recent-ish AArch64 cores. Should this be changed for more cores than just a single one?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132185/new/

https://reviews.llvm.org/D132185