[llvm] 4bbc329 - [SLP] Fix for the min/max intrinsic cost.

Thu Feb 24 18:29:19 PST 2022

Author: Vasileios Porpodas
Date: 2022-02-24T18:08:40-08:00
New Revision: 4bbc3290a25c0dc26007912a96e0f77b2092ee56

URL: https://github.com/llvm/llvm-project/commit/4bbc3290a25c0dc26007912a96e0f77b2092ee56
DIFF: https://github.com/llvm/llvm-project/commit/4bbc3290a25c0dc26007912a96e0f77b2092ee56.diff

LOG: [SLP] Fix for the min/max intrinsic cost.

The min/max intrinsic cost is currently too low because in the cost calculation
we subtract the cost of the vector compare as we will not emit it.
For the cost of the vector compare we are currently passing BAD_ICMP_PREDICATE
which returns 3, the worst case cost.
I think we should be passing VecPred instead, since we know the predicates of
the compare instr.

I think this is related to commit b3b993a7ad817 which introduced the predicate
argument to getCmpSelInstrCost().
https://reviews.llvm.org/rGb3b993a7ad817c3c5801341fa78f34332900eb83

Differential Revision: https://reviews.llvm.org/D120439

Added: 
    

Modified: 
    llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
    llvm/test/Transforms/SLPVectorizer/X86/arith-max-cost.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 9bba73bcc839..acf0851ae45f 100644

--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -5358,9 +5358,8 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
         // If the selects are the only uses of the compares, they will be dead
         // and we can adjust the cost by removing their cost.
         if (IntrinsicAndUse.second)
-          IntrinsicCost -=
-              TTI->getCmpSelInstrCost(Instruction::ICmp, VecTy, MaskTy,
-                                      CmpInst::BAD_ICMP_PREDICATE, CostKind);
+          IntrinsicCost -= TTI->getCmpSelInstrCost(Instruction::ICmp, VecTy,
+                                                   MaskTy, VecPred, CostKind);
         VecCost = std::min(VecCost, IntrinsicCost);
       }
       LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));

diff  --git a/llvm/test/Transforms/SLPVectorizer/X86/arith-max-cost.ll b/llvm/test/Transforms/SLPVectorizer/X86/arith-max-cost.ll
index ead647de9e8d..a2b892028d43 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/arith-max-cost.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/arith-max-cost.ll
@@ -6,13 +6,11 @@
 ; This maps to a single PMAX instruction in x86.
 define void @smax_intrinsic_cost(i64 %arg0, i64 %arg1) {
 ; CHECK-LABEL: @smax_intrinsic_cost(
-; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[ARG0:%.*]], i32 0
-; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[ARG1:%.*]], i32 1
-; CHECK-NEXT:    [[TMP3:%.*]] = icmp sgt <2 x i64> [[TMP2]], <i64 123, i64 456>
-; CHECK-NEXT:    [[TMP4:%.*]] = select <2 x i1> [[TMP3]], <2 x i64> [[TMP2]], <2 x i64> <i64 123, i64 456>
-; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 0
-; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1
-; CHECK-NEXT:    [[ROOT:%.*]] = icmp sle i64 [[TMP5]], [[TMP6]]
+; CHECK-NEXT:    [[ICMP0:%.*]] = icmp sgt i64 [[ARG0:%.*]], 123
+; CHECK-NEXT:    [[ICMP1:%.*]] = icmp sgt i64 [[ARG1:%.*]], 456
+; CHECK-NEXT:    [[SELECT0:%.*]] = select i1 [[ICMP0]], i64 [[ARG0]], i64 123
+; CHECK-NEXT:    [[SELECT1:%.*]] = select i1 [[ICMP1]], i64 [[ARG1]], i64 456
+; CHECK-NEXT:    [[ROOT:%.*]] = icmp sle i64 [[SELECT0]], [[SELECT1]]
 ; CHECK-NEXT:    ret void
 ;
   %icmp0 = icmp sgt i64 %arg0, 123