[PATCH] D145578: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation

Mon Mar 13 05:32:22 PDT 2023

dmgreen added a comment.

The CostKind can be TCK_RecipThroughput (the default and the one we usually care most about), TCK_Latency, TCK_CodeSize or TCK_SizeAndLatency. I think if we have the code we might as well get TCK_CodeSize correct and return 0 in that case, so the load+dup have a combined cost of 1. TCK_Latency and TCK_SizeAndLatency I'm less sure about, perhaps leave them with the same costs as TCK_RecipThroughput?

So it might be a little better to change the code to this, with a comment explaining that the other costs are expected to be higher even with ld1r:

  // Check for broadcast loads.
  if (CostKind == TCK_CodeSize && Kind == TTI::SK_Broadcast) {
    bool IsLoad = !Args.empty() && isa<LoadInst>(Args[0]);
    if (IsLoad && LT.second.isVector() &&
        isLegalBroadcastLoad(Tp->getElementType(),
                             LT.second.getVectorElementCount()))
      return 0; // broadcast is handled by ld1r
  }

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145578/new/

https://reviews.llvm.org/D145578