[PATCH] D112877: [BasicTTI] getInterleavedMemoryOpCost(): discount unused members of mask if mask for gap will be used
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 3 06:34:55 PDT 2021
lebedev.ri updated this revision to Diff 384420.
lebedev.ri marked an inline comment as done.
lebedev.ri added a comment.
@RKSimon thank you for the review!
Applied nit suggestion.
Yeah, it would indeed be really great to have test coverage for this.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D112877/new/
https://reviews.llvm.org/D112877
Files:
llvm/include/llvm/CodeGen/BasicTTIImpl.h
Index: llvm/include/llvm/CodeGen/BasicTTIImpl.h
===================================================================
--- llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1239,6 +1239,9 @@
assert(Indices.size() <= Factor &&
"Interleaved memory op has too many members");
+ const APInt DemandedAllSubElts = APInt::getAllOnes(NumSubElts);
+ const APInt DemandedAllResultElts = APInt::getAllOnes(NumElts);
+
APInt DemandedLoadStoreElts = APInt::getZero(NumElts);
for (unsigned Index : Indices) {
assert(Index < Factor && "Invalid index for interleaved memory op");
@@ -1256,7 +1259,8 @@
// The cost is estimated as extract elements at 0, 2, 4, 6 from the
// <8 x i32> vector and insert them into a <4 x i32> vector.
InstructionCost InsSubCost =
- getScalarizationOverhead(SubVT, /*Insert*/ true, /*Extract*/ false);
+ thisT()->getScalarizationOverhead(SubVT, DemandedAllSubElts,
+ /*Insert*/ true, /*Extract*/ false);
Cost += Indices.size() * InsSubCost;
Cost +=
thisT()->getScalarizationOverhead(VT, DemandedLoadStoreElts,
@@ -1276,7 +1280,8 @@
// excluding gaps) from both <4 x i32> vectors and insert into the <12 x
// i32> vector.
InstructionCost ExtSubCost =
- getScalarizationOverhead(SubVT, /*Insert*/ false, /*Extract*/ true);
+ thisT()->getScalarizationOverhead(SubVT, DemandedAllSubElts,
+ /*Insert*/ false, /*Extract*/ true);
Cost += ExtSubCost * Indices.size();
Cost += thisT()->getScalarizationOverhead(VT, DemandedLoadStoreElts,
/*Insert*/ true,
@@ -1300,9 +1305,12 @@
// The cost is estimated as extract all mask elements from the <8xi1> mask
// vector and insert them factor times into the <24xi1> shuffled mask
// vector.
- Cost += getScalarizationOverhead(SubVT, /*Insert*/ false, /*Extract*/ true);
Cost +=
- getScalarizationOverhead(MaskVT, /*Insert*/ true, /*Extract*/ false);
+ thisT()->getScalarizationOverhead(SubVT, DemandedAllSubElts,
+ /*Insert*/ false, /*Extract*/ true);
+ Cost += thisT()->getScalarizationOverhead(
+ MaskVT, UseMaskForGaps ? DemandedLoadStoreElts : DemandedAllResultElts,
+ /*Insert*/ true, /*Extract*/ false);
// The Gaps mask is invariant and created outside the loop, therefore the
// cost of creating it is not accounted for here. However if we have both
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D112877.384420.patch
Type: text/x-patch
Size: 2643 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211103/d6e17ff4/attachment.bin>
More information about the llvm-commits
mailing list