[llvm] [Analysis][SVE] Improve cost model for some extending masked loads (PR #65957)

Wed Sep 27 08:12:24 PDT 2023

================
@@ -2461,6 +2461,25 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
             FP16Tbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT()))
       return AdjustCost(Entry->Cost);
 
+  if ((ISD == ISD::ZERO_EXTEND || ISD == ISD::SIGN_EXTEND) &&
+      CCH == TTI::CastContextHint::Masked && ST->hasSVEorSME() &&
+      TLI->getTypeAction(Src->getContext(), SrcTy) ==
+          TargetLowering::TypePromoteInteger &&
+      TLI->getTypeAction(Dst->getContext(), DstTy) ==
+          TargetLowering::TypeSplitVector) {
+    // The standard behaviour in the backend for these cases is to split the
+    // extend up into two parts:
+    //  1. Perform an extending load or masked load up to the legal type.
+    //  2. Extend the loaded data to the final type.
+    std::pair<InstructionCost, MVT> SrcLT = getTypeLegalizationCost(Src);
----------------
david-arm wrote:

No I don't think so because that's what the second cost calculation is already doing, where I'm passing in Dst which is the final wide, illegal type. And the costs that it generates do seem to reflect the codegen, i.e. nxv8i8 -> nxv8i64 has a cost of 6 vs nxv8i8 -> nxv8i32 with a cost of 3. In each case the first cost is the same, i.e. extending load to nxv8i16.

https://github.com/llvm/llvm-project/pull/65957