[llvm] afc2b7d - [AArch64][CostModel] Make sext/zext free if folded into a masked load

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 20 01:49:13 PDT 2023


Author: David Sherwood
Date: 2023-04-20T08:48:57Z
New Revision: afc2b7db026a9d6c71d1f0d1ec9dd1ca56d1601f

URL: https://github.com/llvm/llvm-project/commit/afc2b7db026a9d6c71d1f0d1ec9dd1ca56d1601f
DIFF: https://github.com/llvm/llvm-project/commit/afc2b7db026a9d6c71d1f0d1ec9dd1ca56d1601f.diff

LOG: [AArch64][CostModel] Make sext/zext free if folded into a masked load

The BasicTTIImpl implementation of getCastInstrCost ensures
that the cost of zext/sext is 0 when following a load if we
know the combined extending load is legal. For SVE we can do
the same for masked loads too, since they use exactly the
same underlying instruction.

Differential Revision: https://reviews.llvm.org/D148123

Added: 
    

Modified: 
    llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
    llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 85101d44813ce..55a0ff15213d3 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2145,6 +2145,13 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
             FP16Tbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT()))
       return AdjustCost(Entry->Cost);
 
+  // The BasicTTIImpl version only deals with CCH==TTI::CastContextHint::Normal,
+  // but we also want to include the TTI::CastContextHint::Masked case too.
+  if ((ISD == ISD::ZERO_EXTEND || ISD == ISD::SIGN_EXTEND) &&
+      CCH == TTI::CastContextHint::Masked && ST->hasSVEorSME() &&
+      TLI->isTypeLegal(DstTy))
+    CCH = TTI::CastContextHint::Normal;
+
   return AdjustCost(
       BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));
 }

diff  --git a/llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll b/llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
index 9832d18c0ace0..35cd95d1debe4 100644
--- a/llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
@@ -116,23 +116,23 @@ define void @scalable_ext_loads() {
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction:   %zext.nxv16i8to64 = zext <vscale x 16 x i8> %load.nxv16i8.3 to <vscale x 16 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %zext.nxv8i16to32 = zext <vscale x 8 x i16> %load.nxv8i16 to <vscale x 8 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction:   %zext.nxv8i16to64 = zext <vscale x 8 x i16> %load.nxv8i16.2 to <vscale x 8 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv2i16to64 = zext <vscale x 2 x i16> %load.nxv2i16 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv2i16to64 = zext <vscale x 2 x i16> %load.nxv2i16 to <vscale x 2 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i32> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %zext.nxv4i32to64 = zext <vscale x 4 x i32> %load.nxv4i32 to <vscale x 4 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv2i32to64 = zext <vscale x 2 x i32> %load.nxv2i32 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv2i32to64 = zext <vscale x 2 x i32> %load.nxv2i32 to <vscale x 2 x i64>
 
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %sext.nxv16i8to16 = sext <vscale x 16 x i8> %load2.nxv16i8 to <vscale x 16 x i16>
@@ -141,23 +141,23 @@ define void @scalable_ext_loads() {
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction:   %sext.nxv16i8to64 = sext <vscale x 16 x i8> %load2.nxv16i8.3 to <vscale x 16 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv8i8to16 = sext <vscale x 8 x i8> %load2.nxv8i8 to <vscale x 8 x i16>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv8i8to16 = sext <vscale x 8 x i8> %load2.nxv8i8 to <vscale x 8 x i16>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv4i8to32 = sext <vscale x 4 x i8> %load2.nxv4i8 to <vscale x 4 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv4i8to32 = sext <vscale x 4 x i8> %load2.nxv4i8 to <vscale x 4 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv2i8to64 = sext <vscale x 2 x i8> %load2.nxv2i8 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv2i8to64 = sext <vscale x 2 x i8> %load2.nxv2i8 to <vscale x 2 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %sext.nxv8i16to32 = sext <vscale x 8 x i16> %load2.nxv8i16 to <vscale x 8 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction:   %sext.nxv8i16to64 = sext <vscale x 8 x i16> %load2.nxv8i16.2 to <vscale x 8 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv4i16to32 = sext <vscale x 4 x i16> %load2.nxv4i16 to <vscale x 4 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv4i16to32 = sext <vscale x 4 x i16> %load2.nxv4i16 to <vscale x 4 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv2i16to64 = sext <vscale x 2 x i16> %load2.nxv2i16 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv2i16to64 = sext <vscale x 2 x i16> %load2.nxv2i16 to <vscale x 2 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i32> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %sext.nxv4i32to64 = sext <vscale x 4 x i32> %load2.nxv4i32 to <vscale x 4 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv2i32to64 = sext <vscale x 2 x i32> %load2.nxv2i32 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv2i32to64 = sext <vscale x 2 x i32> %load2.nxv2i32 to <vscale x 2 x i64>
 
   %load.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
   %zext.nxv16i8to16 = zext <vscale x 16 x i8> %load.nxv16i8 to <vscale x 16 x i16>


        


More information about the llvm-commits mailing list