[PATCH] D148123: [AArch64][CostModel] Make sext/zext free if folded into a masked load

David Sherwood via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 19 05:57:07 PDT 2023


david-arm updated this revision to Diff 514924.
david-arm added a comment.

- Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148123/new/

https://reviews.llvm.org/D148123

Files:
  llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
  llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll


Index: llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
===================================================================
--- llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
+++ llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
@@ -116,23 +116,23 @@
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction:   %zext.nxv16i8to64 = zext <vscale x 16 x i8> %load.nxv16i8.3 to <vscale x 16 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %zext.nxv8i16to32 = zext <vscale x 8 x i16> %load.nxv8i16 to <vscale x 8 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction:   %zext.nxv8i16to64 = zext <vscale x 8 x i16> %load.nxv8i16.2 to <vscale x 8 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv2i16to64 = zext <vscale x 2 x i16> %load.nxv2i16 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv2i16to64 = zext <vscale x 2 x i16> %load.nxv2i16 to <vscale x 2 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i32> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %zext.nxv4i32to64 = zext <vscale x 4 x i32> %load.nxv4i32 to <vscale x 4 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %zext.nxv2i32to64 = zext <vscale x 2 x i32> %load.nxv2i32 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %zext.nxv2i32to64 = zext <vscale x 2 x i32> %load.nxv2i32 to <vscale x 2 x i64>
 
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %sext.nxv16i8to16 = sext <vscale x 16 x i8> %load2.nxv16i8 to <vscale x 16 x i16>
@@ -141,23 +141,23 @@
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction:   %sext.nxv16i8to64 = sext <vscale x 16 x i8> %load2.nxv16i8.3 to <vscale x 16 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv8i8to16 = sext <vscale x 8 x i8> %load2.nxv8i8 to <vscale x 8 x i16>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv8i8to16 = sext <vscale x 8 x i8> %load2.nxv8i8 to <vscale x 8 x i16>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv4i8to32 = sext <vscale x 4 x i8> %load2.nxv4i8 to <vscale x 4 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv4i8to32 = sext <vscale x 4 x i8> %load2.nxv4i8 to <vscale x 4 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv2i8to64 = sext <vscale x 2 x i8> %load2.nxv2i8 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv2i8to64 = sext <vscale x 2 x i8> %load2.nxv2i8 to <vscale x 2 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %sext.nxv8i16to32 = sext <vscale x 8 x i16> %load2.nxv8i16 to <vscale x 8 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction:   %sext.nxv8i16to64 = sext <vscale x 8 x i16> %load2.nxv8i16.2 to <vscale x 8 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv4i16to32 = sext <vscale x 4 x i16> %load2.nxv4i16 to <vscale x 4 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv4i16to32 = sext <vscale x 4 x i16> %load2.nxv4i16 to <vscale x 4 x i32>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv2i16to64 = sext <vscale x 2 x i16> %load2.nxv2i16 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv2i16to64 = sext <vscale x 2 x i16> %load2.nxv2i16 to <vscale x 2 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i32> undef)
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction:   %sext.nxv4i32to64 = sext <vscale x 4 x i32> %load2.nxv4i32 to <vscale x 4 x i64>
 ; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %load2.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction:   %sext.nxv2i32to64 = sext <vscale x 2 x i32> %load2.nxv2i32 to <vscale x 2 x i64>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   %sext.nxv2i32to64 = sext <vscale x 2 x i32> %load2.nxv2i32 to <vscale x 2 x i64>
 
   %load.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
   %zext.nxv16i8to16 = zext <vscale x 16 x i8> %load.nxv16i8 to <vscale x 16 x i16>
Index: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
===================================================================
--- llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2145,6 +2145,13 @@
             FP16Tbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT()))
       return AdjustCost(Entry->Cost);
 
+  // The BasicTTIImpl version only deals with CCH==TTI::CastContextHint::Normal,
+  // but we also want to include the TTI::CastContextHint::Masked case too.
+  if ((ISD == ISD::ZERO_EXTEND || ISD == ISD::SIGN_EXTEND) &&
+      (CCH == TTI::CastContextHint::Masked) && ST->hasSVEorSME() &&
+      TLI->isTypeLegal(DstTy))
+    CCH = TTI::CastContextHint::Normal;
+
   return AdjustCost(
       BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));
 }


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D148123.514924.patch
Type: text/x-patch
Size: 10935 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230419/ed59df01/attachment.bin>


More information about the llvm-commits mailing list