[PATCH] D146708: [AArch64][CodeGen] Reduce cost of indexed ld1 instructions for Neoverse V1/V2 cores

Fri Mar 24 01:31:22 PDT 2023

dmgreen added a reviewer: SjoerdMeijer.
dmgreen added a comment.

Hello - I'm not a fan of this patch, but I am interested in why you are making it. You mention that "Tested the patch for SPEC2017 on neoverse-V1 and no regressions were observed." Do you have other cases where this does show improvement?

I am fairly strongly against the cost model having a lot of very cpu-specific micro adjustments, at least without good reason for them. It creates a maintenance burden that I have not yet seen justified.

Note that for the V1/V2 _all_ vector operations have double the throughput of N1/N2, not just these particular lane insert operations. But we probably don't get the relative costs of scalar vs load/store vs vector precisely correct at the moment. The cost model in aarch64 is fairly generic, I've not seen in the past any strong reason to change that. Some parts (like the costs of inserts/extracts) are the way they are less due to the exact relative costs of various operations on particular cpus, and more to try and control the type of code produced by the slp vectorizer. But in the case of the ld1 lane inserts cost I believe it is set high because they require both the L and V pipelines.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146708/new/

https://reviews.llvm.org/D146708