[PATCH] D112877: [BasicTTI] getInterleavedMemoryOpCost(): discount unused members of mask if mask for gap will be used

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Oct 30 16:08:39 PDT 2021


lebedev.ri created this revision.
lebedev.ri added reviewers: RKSimon, pengfei, dorit, Ayal, hsaito, fhahn.
lebedev.ri added a project: LLVM.
lebedev.ri requested review of this revision.

As it can be seen in `InnerLoopVectorizer::vectorizeInterleaveGroup()`,
in some cases (reported by `UseMaskForGaps`), the gaps in the interleaved load/store group
will be masked away by another constant mask, so there is no need to
account for the cost of replication of the mask for these.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D112877

Files:
  llvm/include/llvm/CodeGen/BasicTTIImpl.h


Index: llvm/include/llvm/CodeGen/BasicTTIImpl.h
===================================================================
--- llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1255,8 +1255,9 @@
       //      %v0 = shuffle %vec, undef, <0, 2, 4, 6>         ; Index 0
       // The cost is estimated as extract elements at 0, 2, 4, 6 from the
       // <8 x i32> vector and insert them into a <4 x i32> vector.
-      InstructionCost InsSubCost =
-          getScalarizationOverhead(SubVT, /*Insert*/ true, /*Extract*/ false);
+      InstructionCost InsSubCost = thisT()->getScalarizationOverhead(
+          SubVT, APInt::getAllOnes(NumSubElts),
+          /*Insert*/ true, /*Extract*/ false);
       Cost += Indices.size() * InsSubCost;
       Cost +=
           thisT()->getScalarizationOverhead(VT, DemandedLoadStoreElts,
@@ -1275,8 +1276,9 @@
       // The cost is estimated as extract all elements (of actual members,
       // excluding gaps) from both <4 x i32> vectors and insert into the <12 x
       // i32> vector.
-      InstructionCost ExtSubCost =
-          getScalarizationOverhead(SubVT, /*Insert*/ false, /*Extract*/ true);
+      InstructionCost ExtSubCost = thisT()->getScalarizationOverhead(
+          SubVT, APInt::getAllOnes(NumSubElts),
+          /*Insert*/ false, /*Extract*/ true);
       Cost += ExtSubCost * Indices.size();
       Cost += thisT()->getScalarizationOverhead(VT, DemandedLoadStoreElts,
                                                 /*Insert*/ true,
@@ -1300,9 +1302,13 @@
     // The cost is estimated as extract all mask elements from the <8xi1> mask
     // vector and insert them factor times into the <24xi1> shuffled mask
     // vector.
-    Cost += getScalarizationOverhead(SubVT, /*Insert*/ false, /*Extract*/ true);
     Cost +=
-        getScalarizationOverhead(MaskVT, /*Insert*/ true, /*Extract*/ false);
+        thisT()->getScalarizationOverhead(SubVT, APInt::getAllOnes(NumSubElts),
+                                          /*Insert*/ false, /*Extract*/ true);
+    Cost += thisT()->getScalarizationOverhead(
+        MaskVT,
+        UseMaskForGaps ? DemandedLoadStoreElts : APInt::getAllOnes(NumElts),
+        /*Insert*/ true, /*Extract*/ false);
 
     // The Gaps mask is invariant and created outside the loop, therefore the
     // cost of creating it is not accounted for here. However if we have both


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D112877.383624.patch
Type: text/x-patch
Size: 2408 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211030/3225f2f6/attachment.bin>


More information about the llvm-commits mailing list