[llvm] r312791 - [SLP] Support for horizontal min/max reduction.

Fri Sep 15 15:25:32 PDT 2017

I think this caused a miscompile: http://llvm.org/PR34635

I've reverted it for now in r313409.

On Fri, Sep 8, 2017 at 3:52 PM Alexey Bataev via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Galina, thanks. Will fix it ASAP
>
> Best regards,
> Alexey Bataev
>
> 8 сент. 2017 г., в 17:56, Galina Kistanova <gkistanova at gmail.com>
> написал(а):
>
> Hello Alexey,
>
> It looks like this commit added warnings to one of our builders:
> http://lab.llvm.org:8011/builders/ubuntu-gcc7.1-werror/builds/1263
>
> ...
> FAILED: /usr/local/gcc-7.1/bin/g++-7.1   -DGTEST_HAS_RTTI=0 -D_DEBUG
> -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
> -D__STDC_LIMIT_MACROS -Ilib/Transforms/Vectorize
> -I/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize
> -Iinclude -I/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/include
> -Wno-noexcept-type -fPIC -fvisibility-inlines-hidden -Werror
> -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings
> -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long
> -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
> -ffunction-sections -fdata-sections -O3  -fPIC   -UNDEBUG  -fno-exceptions
> -fno-rtti -MD -MT
> lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o
> -MF
> lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o.d
> -o
> lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o
> -c
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:
> In member function ‘unsigned int
> {anonymous}::HorizontalReduction::OperationData::getRequiredNumberOfUses()
> const’:
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4733:5:
> error: control reaches end of non-void function [-Werror=return-type]
>      }
>      ^
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:
> In member function ‘unsigned int
> {anonymous}::HorizontalReduction::OperationData::getNumberOfOperands()
> const’:
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4716:5:
> error: control reaches end of non-void function [-Werror=return-type]
>      }
>      ^
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:
> In member function ‘int
> {anonymous}::HorizontalReduction::getReductionCost(llvm::TargetTransformInfo*,
> llvm::Value*, unsigned int)’:
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5183:18:
> error: this statement may fall through [-Werror=implicit-fallthrough=]
>        IsUnsigned = false;
>        ~~~~~~~~~~~^~~~~~~
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5184:5:
> note: here
>      case RK_UMin:
>      ^~~~
> cc1plus: all warnings being treated as errors
>
>
> Please have a look?
>
> Thanks
>
> Galina
>
> On Fri, Sep 8, 2017 at 6:49 AM, Alexey Bataev via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Author: abataev
>> Date: Fri Sep  8 06:49:36 2017
>> New Revision: 312791
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=312791&view=rev
>> Log:
>> [SLP] Support for horizontal min/max reduction.
>>
>> SLP vectorizer supports horizontal reductions for Add/FAdd binary
>> operations. Patch adds support for horizontal min/max reductions.
>> Function getReductionCost() is split to getArithmeticReductionCost() for
>> binary operation reductions and getMinMaxReductionCost() for min/max
>> reductions.
>> Patch fixes PR26956.
>>
>> Differential revision: https://reviews.llvm.org/D27846
>>
>> Modified:
>>     llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h
>>     llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
>>     llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
>>     llvm/trunk/lib/Analysis/CostModel.cpp
>>     llvm/trunk/lib/Analysis/TargetTransformInfo.cpp
>>     llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>>     llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
>>     llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll
>>
>> Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h (original)
>> +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h Fri Sep  8
>> 06:49:36 2017
>> @@ -732,6 +732,8 @@ public:
>>    ///  ((v0+v2), (v1+v3), undef, undef)
>>    int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
>>                                   bool IsPairwiseForm) const;
>> +  int getMinMaxReductionCost(Type *Ty, Type *CondTy, bool IsPairwiseForm,
>> +                             bool IsUnsigned) const;
>>
>>    /// \returns The cost of Intrinsic instructions. Analyses the real
>> arguments.
>>    /// Three cases are handled: 1. scalar instruction 2. vector
>> instruction
>> @@ -998,6 +1000,8 @@ public:
>>                                           unsigned AddressSpace) = 0;
>>    virtual int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
>>                                           bool IsPairwiseForm) = 0;
>> +  virtual int getMinMaxReductionCost(Type *Ty, Type *CondTy,
>> +                                     bool IsPairwiseForm, bool
>> IsUnsigned) = 0;
>>    virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
>>                        ArrayRef<Type *> Tys, FastMathFlags FMF,
>>                        unsigned ScalarizationCostPassed) = 0;
>> @@ -1309,6 +1313,10 @@ public:
>>                                   bool IsPairwiseForm) override {
>>      return Impl.getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);
>>    }
>> +  int getMinMaxReductionCost(Type *Ty, Type *CondTy,
>> +                             bool IsPairwiseForm, bool IsUnsigned)
>> override {
>> +    return Impl.getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm,
>> IsUnsigned);
>> +   }
>>    int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, ArrayRef<Type
>> *> Tys,
>>                 FastMathFlags FMF, unsigned ScalarizationCostPassed)
>> override {
>>      return Impl.getIntrinsicInstrCost(ID, RetTy, Tys, FMF,
>>
>> Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h (original)
>> +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h Fri Sep  8
>> 06:49:36 2017
>> @@ -451,6 +451,8 @@ public:
>>
>>    unsigned getArithmeticReductionCost(unsigned, Type *, bool) { return
>> 1; }
>>
>> +  unsigned getMinMaxReductionCost(Type *, Type *, bool, bool) { return
>> 1; }
>> +
>>    unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) { return
>> 0; }
>>
>>    bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) {
>>
>> Modified: llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h (original)
>> +++ llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h Fri Sep  8 06:49:36
>> 2017
>> @@ -1166,6 +1166,66 @@ public:
>>      return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false,
>> true);
>>    }
>>
>> +  /// Try to calculate op costs for min/max reduction operations.
>> +  /// \param CondTy Conditional type for the Select instruction.
>> +  unsigned getMinMaxReductionCost(Type *Ty, Type *CondTy, bool
>> IsPairwise,
>> +                                  bool) {
>> +    assert(Ty->isVectorTy() && "Expect a vector type");
>> +    Type *ScalarTy = Ty->getVectorElementType();
>> +    Type *ScalarCondTy = CondTy->getVectorElementType();
>> +    unsigned NumVecElts = Ty->getVectorNumElements();
>> +    unsigned NumReduxLevels = Log2_32(NumVecElts);
>> +    unsigned CmpOpcode;
>> +    if (Ty->isFPOrFPVectorTy()) {
>> +      CmpOpcode = Instruction::FCmp;
>> +    } else {
>> +      assert(Ty->isIntOrIntVectorTy() &&
>> +             "expecting floating point or integer type for min/max
>> reduction");
>> +      CmpOpcode = Instruction::ICmp;
>> +    }
>> +    unsigned MinMaxCost = 0;
>> +    unsigned ShuffleCost = 0;
>> +    auto *ConcreteTTI = static_cast<T *>(this);
>> +    std::pair<unsigned, MVT> LT =
>> +        ConcreteTTI->getTLI()->getTypeLegalizationCost(DL, Ty);
>> +    unsigned LongVectorCount = 0;
>> +    unsigned MVTLen =
>> +        LT.second.isVector() ? LT.second.getVectorNumElements() : 1;
>> +    while (NumVecElts > MVTLen) {
>> +      NumVecElts /= 2;
>> +      // Assume the pairwise shuffles add a cost.
>> +      ShuffleCost += (IsPairwise + 1) *
>> +
>>  ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,
>> +                                                 NumVecElts, Ty);
>> +      MinMaxCost +=
>> +          ConcreteTTI->getCmpSelInstrCost(CmpOpcode, Ty, CondTy,
>> nullptr) +
>> +          ConcreteTTI->getCmpSelInstrCost(Instruction::Select, Ty,
>> CondTy,
>> +                                          nullptr);
>> +      Ty = VectorType::get(ScalarTy, NumVecElts);
>> +      CondTy = VectorType::get(ScalarCondTy, NumVecElts);
>> +      ++LongVectorCount;
>> +    }
>> +    // The minimal length of the vector is limited by the real length of
>> vector
>> +    // operations performed on the current platform. That's why several
>> final
>> +    // reduction opertions are perfomed on the vectors with the same
>> +    // architecture-dependent length.
>> +    ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1)
>> *
>> +                   ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector,
>> Ty,
>> +                                               NumVecElts, Ty);
>> +    MinMaxCost +=
>> +        (NumReduxLevels - LongVectorCount) *
>> +        (ConcreteTTI->getCmpSelInstrCost(CmpOpcode, Ty, CondTy, nullptr)
>> +
>> +         ConcreteTTI->getCmpSelInstrCost(Instruction::Select, Ty, CondTy,
>> +                                         nullptr));
>> +    // Need 3 extractelement instructions for scalarization + an
>> additional
>> +    // scalar select instruction.
>> +    return ShuffleCost + MinMaxCost +
>> +           3 * getScalarizationOverhead(Ty, /*Insert=*/false,
>> +                                        /*Extract=*/true) +
>> +           ConcreteTTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
>> +                                           ScalarCondTy, nullptr);
>> +  }
>> +
>>    unsigned getVectorSplitCost() { return 1; }
>>
>>    /// @}
>>
>> Modified: llvm/trunk/lib/Analysis/CostModel.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/CostModel.cpp?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Analysis/CostModel.cpp (original)
>> +++ llvm/trunk/lib/Analysis/CostModel.cpp Fri Sep  8 06:49:36 2017
>> @@ -186,26 +186,56 @@ static bool matchPairwiseShuffleMask(Shu
>>  }
>>
>>  namespace {
>> +/// Kind of the reduction data.
>> +enum ReductionKind {
>> +  RK_None,           /// Not a reduction.
>> +  RK_Arithmetic,     /// Binary reduction data.
>> +  RK_MinMax,         /// Min/max reduction data.
>> +  RK_UnsignedMinMax, /// Unsigned min/max reduction data.
>> +};
>>  /// Contains opcode + LHS/RHS parts of the reduction operations.
>>  struct ReductionData {
>> -  explicit ReductionData() = default;
>> -  ReductionData(unsigned Opcode, Value *LHS, Value *RHS)
>> -      : Opcode(Opcode), LHS(LHS), RHS(RHS) {}
>> +  ReductionData() = delete;
>> +  ReductionData(ReductionKind Kind, unsigned Opcode, Value *LHS, Value
>> *RHS)
>> +      : Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind) {
>> +    assert(Kind != RK_None && "expected binary or min/max reduction
>> only.");
>> +  }
>>    unsigned Opcode = 0;
>>    Value *LHS = nullptr;
>>    Value *RHS = nullptr;
>> +  ReductionKind Kind = RK_None;
>> +  bool hasSameData(ReductionData &RD) const {
>> +    return Kind == RD.Kind && Opcode == RD.Opcode;
>> +  }
>>  };
>>  } // namespace
>>
>>  static Optional<ReductionData> getReductionData(Instruction *I) {
>>    Value *L, *R;
>>    if (m_BinOp(m_Value(L), m_Value(R)).match(I))
>> -    return ReductionData(I->getOpcode(), L, R);
>> +    return ReductionData(RK_Arithmetic, I->getOpcode(), L, R);
>> +  if (auto *SI = dyn_cast<SelectInst>(I)) {
>> +    if (m_SMin(m_Value(L), m_Value(R)).match(SI) ||
>> +        m_SMax(m_Value(L), m_Value(R)).match(SI) ||
>> +        m_OrdFMin(m_Value(L), m_Value(R)).match(SI) ||
>> +        m_OrdFMax(m_Value(L), m_Value(R)).match(SI) ||
>> +        m_UnordFMin(m_Value(L), m_Value(R)).match(SI) ||
>> +        m_UnordFMax(m_Value(L), m_Value(R)).match(SI)) {
>> +      auto *CI = cast<CmpInst>(SI->getCondition());
>> +      return ReductionData(RK_MinMax, CI->getOpcode(), L, R);
>> +    }
>> +    if (m_UMin(m_Value(L), m_Value(R)).match(SI) ||
>> +        m_UMax(m_Value(L), m_Value(R)).match(SI)) {
>> +      auto *CI = cast<CmpInst>(SI->getCondition());
>> +      return ReductionData(RK_UnsignedMinMax, CI->getOpcode(), L, R);
>> +    }
>> +  }
>>    return llvm::None;
>>  }
>>
>> -static bool matchPairwiseReductionAtLevel(Instruction *I, unsigned Level,
>> -                                          unsigned NumLevels) {
>> +static ReductionKind matchPairwiseReductionAtLevel(Instruction *I,
>> +                                                   unsigned Level,
>> +                                                   unsigned NumLevels) {
>>    // Match one level of pairwise operations.
>>    // %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
>>    //       <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
>> @@ -213,24 +243,24 @@ static bool matchPairwiseReductionAtLeve
>>    //       <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
>>    // %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
>>    if (!I)
>> -    return false;
>> +    return RK_None;
>>
>>    assert(I->getType()->isVectorTy() && "Expecting a vector type");
>>
>>    Optional<ReductionData> RD = getReductionData(I);
>>    if (!RD)
>> -    return false;
>> +    return RK_None;
>>
>>    ShuffleVectorInst *LS = dyn_cast<ShuffleVectorInst>(RD->LHS);
>>    if (!LS && Level)
>> -    return false;
>> +    return RK_None;
>>    ShuffleVectorInst *RS = dyn_cast<ShuffleVectorInst>(RD->RHS);
>>    if (!RS && Level)
>> -    return false;
>> +    return RK_None;
>>
>>    // On level 0 we can omit one shufflevector instruction.
>>    if (!Level && !RS && !LS)
>> -    return false;
>> +    return RK_None;
>>
>>    // Shuffle inputs must match.
>>    Value *NextLevelOpL = LS ? LS->getOperand(0) : nullptr;
>> @@ -239,7 +269,7 @@ static bool matchPairwiseReductionAtLeve
>>    if (NextLevelOpR && NextLevelOpL) {
>>      // If we have two shuffles their operands must match.
>>      if (NextLevelOpL != NextLevelOpR)
>> -      return false;
>> +      return RK_None;
>>
>>      NextLevelOp = NextLevelOpL;
>>    } else if (Level == 0 && (NextLevelOpR || NextLevelOpL)) {
>> @@ -250,45 +280,47 @@ static bool matchPairwiseReductionAtLeve
>>      //  %NextLevelOpL = shufflevector %R, <1, undef ...>
>>      //  %BinOp        = fadd          %NextLevelOpL, %R
>>      if (NextLevelOpL && NextLevelOpL != RD->RHS)
>> -      return false;
>> +      return RK_None;
>>      else if (NextLevelOpR && NextLevelOpR != RD->LHS)
>> -      return false;
>> +      return RK_None;
>>
>>      NextLevelOp = NextLevelOpL ? RD->RHS : RD->LHS;
>> -  } else
>> -    return false;
>> +  } else {
>> +    return RK_None;
>> +  }
>>
>>    // Check that the next levels binary operation exists and matches with
>> the
>>    // current one.
>>    if (Level + 1 != NumLevels) {
>>      Optional<ReductionData> NextLevelRD =
>>          getReductionData(cast<Instruction>(NextLevelOp));
>> -    if (!NextLevelRD || RD->Opcode != NextLevelRD->Opcode)
>> -      return false;
>> +    if (!NextLevelRD || !RD->hasSameData(*NextLevelRD))
>> +      return RK_None;
>>    }
>>
>>    // Shuffle mask for pairwise operation must match.
>>    if (matchPairwiseShuffleMask(LS, /*IsLeft=*/true, Level)) {
>>      if (!matchPairwiseShuffleMask(RS, /*IsLeft=*/false, Level))
>> -      return false;
>> +      return RK_None;
>>    } else if (matchPairwiseShuffleMask(RS, /*IsLeft=*/true, Level)) {
>>      if (!matchPairwiseShuffleMask(LS, /*IsLeft=*/false, Level))
>> -      return false;
>> -  } else
>> -    return false;
>> +      return RK_None;
>> +  } else {
>> +    return RK_None;
>> +  }
>>
>>    if (++Level == NumLevels)
>> -    return true;
>> +    return RD->Kind;
>>
>>    // Match next level.
>>    return matchPairwiseReductionAtLevel(cast<Instruction>(NextLevelOp),
>> Level,
>>                                         NumLevels);
>>  }
>>
>> -static bool matchPairwiseReduction(const ExtractElementInst *ReduxRoot,
>> -                                   unsigned &Opcode, Type *&Ty) {
>> +static ReductionKind matchPairwiseReduction(const ExtractElementInst
>> *ReduxRoot,
>> +                                            unsigned &Opcode, Type *&Ty)
>> {
>>    if (!EnableReduxCost)
>> -    return false;
>> +    return RK_None;
>>
>>    // Need to extract the first element.
>>    ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
>> @@ -296,19 +328,19 @@ static bool matchPairwiseReduction(const
>>    if (CI)
>>      Idx = CI->getZExtValue();
>>    if (Idx != 0)
>> -    return false;
>> +    return RK_None;
>>
>>    auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
>>    if (!RdxStart)
>> -    return false;
>> +    return RK_None;
>>    Optional<ReductionData> RD = getReductionData(RdxStart);
>>    if (!RD)
>> -    return false;
>> +    return RK_None;
>>
>>    Type *VecTy = RdxStart->getType();
>>    unsigned NumVecElems = VecTy->getVectorNumElements();
>>    if (!isPowerOf2_32(NumVecElems))
>> -    return false;
>> +    return RK_None;
>>
>>    // We look for a sequence of shuffle,shuffle,add triples like the
>> following
>>    // that builds a pairwise reduction tree.
>> @@ -328,13 +360,14 @@ static bool matchPairwiseReduction(const
>>    //       <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
>>    // %bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
>>    // %r = extractelement <4 x float> %bin.rdx8, i32 0
>> -  if (!matchPairwiseReductionAtLevel(RdxStart, 0,  Log2_32(NumVecElems)))
>> -    return false;
>> +  if (matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)) ==
>> +      RK_None)
>> +    return RK_None;
>>
>>    Opcode = RD->Opcode;
>>    Ty = VecTy;
>>
>> -  return true;
>> +  return RD->Kind;
>>  }
>>
>>  static std::pair<Value *, ShuffleVectorInst *>
>> @@ -348,10 +381,11 @@ getShuffleAndOtherOprd(Value *L, Value *
>>    return std::make_pair(L, S);
>>  }
>>
>> -static bool matchVectorSplittingReduction(const ExtractElementInst
>> *ReduxRoot,
>> -                                          unsigned &Opcode, Type *&Ty) {
>> +static ReductionKind
>> +matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,
>> +                              unsigned &Opcode, Type *&Ty) {
>>    if (!EnableReduxCost)
>> -    return false;
>> +    return RK_None;
>>
>>    // Need to extract the first element.
>>    ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
>> @@ -359,19 +393,19 @@ static bool matchVectorSplittingReductio
>>    if (CI)
>>      Idx = CI->getZExtValue();
>>    if (Idx != 0)
>> -    return false;
>> +    return RK_None;
>>
>>    auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
>>    if (!RdxStart)
>> -    return false;
>> +    return RK_None;
>>    Optional<ReductionData> RD = getReductionData(RdxStart);
>>    if (!RD)
>> -    return false;
>> +    return RK_None;
>>
>>    Type *VecTy = ReduxRoot->getOperand(0)->getType();
>>    unsigned NumVecElems = VecTy->getVectorNumElements();
>>    if (!isPowerOf2_32(NumVecElems))
>> -    return false;
>> +    return RK_None;
>>
>>    // We look for a sequence of shuffles and adds like the following
>> matching one
>>    // fadd, shuffle vector pair at a time.
>> @@ -391,10 +425,10 @@ static bool matchVectorSplittingReductio
>>    while (NumVecElemsRemain - 1) {
>>      // Check for the right reduction operation.
>>      if (!RdxOp)
>> -      return false;
>> +      return RK_None;
>>      Optional<ReductionData> RDLevel = getReductionData(RdxOp);
>> -    if (!RDLevel || RDLevel->Opcode != RD->Opcode)
>> -      return false;
>> +    if (!RDLevel || !RDLevel->hasSameData(*RD))
>> +      return RK_None;
>>
>>      Value *NextRdxOp;
>>      ShuffleVectorInst *Shuffle;
>> @@ -403,9 +437,9 @@ static bool matchVectorSplittingReductio
>>
>>      // Check the current reduction operation and the shuffle use the
>> same value.
>>      if (Shuffle == nullptr)
>> -      return false;
>> +      return RK_None;
>>      if (Shuffle->getOperand(0) != NextRdxOp)
>> -      return false;
>> +      return RK_None;
>>
>>      // Check that shuffle masks matches.
>>      for (unsigned j = 0; j != MaskStart; ++j)
>> @@ -415,7 +449,7 @@ static bool matchVectorSplittingReductio
>>
>>      SmallVector<int, 16> Mask = Shuffle->getShuffleMask();
>>      if (ShuffleMask != Mask)
>> -      return false;
>> +      return RK_None;
>>
>>      RdxOp = dyn_cast<Instruction>(NextRdxOp);
>>      NumVecElemsRemain /= 2;
>> @@ -424,7 +458,7 @@ static bool matchVectorSplittingReductio
>>
>>    Opcode = RD->Opcode;
>>    Ty = VecTy;
>> -  return true;
>> +  return RD->Kind;
>>  }
>>
>>  unsigned CostModelAnalysis::getInstructionCost(const Instruction *I)
>> const {
>> @@ -519,13 +553,36 @@ unsigned CostModelAnalysis::getInstructi
>>      unsigned ReduxOpCode;
>>      Type *ReduxType;
>>
>> -    if (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {
>> +    switch (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {
>> +    case RK_Arithmetic:
>>        return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,
>>                                               /*IsPairwiseForm=*/false);
>> +    case RK_MinMax:
>> +      return TTI->getMinMaxReductionCost(
>> +          ReduxType, CmpInst::makeCmpResultType(ReduxType),
>> +          /*IsPairwiseForm=*/false, /*IsUnsigned=*/false);
>> +    case RK_UnsignedMinMax:
>> +      return TTI->getMinMaxReductionCost(
>> +          ReduxType, CmpInst::makeCmpResultType(ReduxType),
>> +          /*IsPairwiseForm=*/false, /*IsUnsigned=*/true);
>> +    case RK_None:
>> +      break;
>>      }
>> -    if (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
>> +
>> +    switch (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
>> +    case RK_Arithmetic:
>>        return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,
>>                                               /*IsPairwiseForm=*/true);
>> +    case RK_MinMax:
>> +      return TTI->getMinMaxReductionCost(
>> +          ReduxType, CmpInst::makeCmpResultType(ReduxType),
>> +          /*IsPairwiseForm=*/true, /*IsUnsigned=*/false);
>> +    case RK_UnsignedMinMax:
>> +      return TTI->getMinMaxReductionCost(
>> +          ReduxType, CmpInst::makeCmpResultType(ReduxType),
>> +          /*IsPairwiseForm=*/true, /*IsUnsigned=*/true);
>> +    case RK_None:
>> +      break;
>>      }
>>
>>      return TTI->getVectorInstrCost(I->getOpcode(),
>>
>> Modified: llvm/trunk/lib/Analysis/TargetTransformInfo.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/TargetTransformInfo.cpp?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Analysis/TargetTransformInfo.cpp (original)
>> +++ llvm/trunk/lib/Analysis/TargetTransformInfo.cpp Fri Sep  8 06:49:36
>> 2017
>> @@ -484,6 +484,15 @@ int TargetTransformInfo::getArithmeticRe
>>    return Cost;
>>  }
>>
>> +int TargetTransformInfo::getMinMaxReductionCost(Type *Ty, Type *CondTy,
>> +                                                bool IsPairwiseForm,
>> +                                                bool IsUnsigned) const {
>> +  int Cost =
>> +      TTIImpl->getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm,
>> IsUnsigned);
>> +  assert(Cost >= 0 && "TTI should not produce negative costs!");
>> +  return Cost;
>> +}
>> +
>>  unsigned
>>  TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys)
>> const {
>>    return TTIImpl->getCostOfKeepingLiveOverCall(Tys);
>>
>> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Fri Sep  8
>> 06:49:36 2017
>> @@ -1999,6 +1999,152 @@ int X86TTIImpl::getArithmeticReductionCo
>>    return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwise);
>>  }
>>
>> +int X86TTIImpl::getMinMaxReductionCost(Type *ValTy, Type *CondTy,
>> +                                       bool IsPairwise, bool IsUnsigned)
>> {
>> +  std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
>> +
>> +  MVT MTy = LT.second;
>> +
>> +  int ISD;
>> +  if (ValTy->isIntOrIntVectorTy()) {
>> +    ISD = IsUnsigned ? ISD::UMIN : ISD::SMIN;
>> +  } else {
>> +    assert(ValTy->isFPOrFPVectorTy() &&
>> +           "Expected float point or integer vector type.");
>> +    ISD = ISD::FMINNUM;
>> +  }
>> +
>> +  // We use the Intel Architecture Code Analyzer(IACA) to measure the
>> throughput
>> +  // and make it as the cost.
>> +
>> +  static const CostTblEntry SSE42CostTblPairWise[] = {
>> +      {ISD::FMINNUM, MVT::v2f64, 3},
>> +      {ISD::FMINNUM, MVT::v4f32, 2},
>> +      {ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is
>> "6.8"
>> +      {ISD::UMIN, MVT::v2i64, 8}, // The data reported by the IACA is
>> "8.6"
>> +      {ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is
>> "1.5"
>> +      {ISD::UMIN, MVT::v4i32, 2}, // The data reported by the IACA is
>> "1.8"
>> +      {ISD::SMIN, MVT::v8i16, 2},
>> +      {ISD::UMIN, MVT::v8i16, 2},
>> +  };
>> +
>> +  static const CostTblEntry AVX1CostTblPairWise[] = {
>> +      {ISD::FMINNUM, MVT::v4f32, 1},
>> +      {ISD::FMINNUM, MVT::v4f64, 1},
>> +      {ISD::FMINNUM, MVT::v8f32, 2},
>> +      {ISD::SMIN, MVT::v2i64, 3},
>> +      {ISD::UMIN, MVT::v2i64, 3},
>> +      {ISD::SMIN, MVT::v4i32, 1},
>> +      {ISD::UMIN, MVT::v4i32, 1},
>> +      {ISD::SMIN, MVT::v8i16, 1},
>> +      {ISD::UMIN, MVT::v8i16, 1},
>> +      {ISD::SMIN, MVT::v8i32, 3},
>> +      {ISD::UMIN, MVT::v8i32, 3},
>> +  };
>> +
>> +  static const CostTblEntry AVX2CostTblPairWise[] = {
>> +      {ISD::SMIN, MVT::v4i64, 2},
>> +      {ISD::UMIN, MVT::v4i64, 2},
>> +      {ISD::SMIN, MVT::v8i32, 1},
>> +      {ISD::UMIN, MVT::v8i32, 1},
>> +      {ISD::SMIN, MVT::v16i16, 1},
>> +      {ISD::UMIN, MVT::v16i16, 1},
>> +      {ISD::SMIN, MVT::v32i8, 2},
>> +      {ISD::UMIN, MVT::v32i8, 2},
>> +  };
>> +
>> +  static const CostTblEntry AVX512CostTblPairWise[] = {
>> +      {ISD::FMINNUM, MVT::v8f64, 1},
>> +      {ISD::FMINNUM, MVT::v16f32, 2},
>> +      {ISD::SMIN, MVT::v8i64, 2},
>> +      {ISD::UMIN, MVT::v8i64, 2},
>> +      {ISD::SMIN, MVT::v16i32, 1},
>> +      {ISD::UMIN, MVT::v16i32, 1},
>> +  };
>> +
>> +  static const CostTblEntry SSE42CostTblNoPairWise[] = {
>> +      {ISD::FMINNUM, MVT::v2f64, 3},
>> +      {ISD::FMINNUM, MVT::v4f32, 3},
>> +      {ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is
>> "6.8"
>> +      {ISD::UMIN, MVT::v2i64, 9}, // The data reported by the IACA is
>> "8.6"
>> +      {ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is
>> "1.5"
>> +      {ISD::UMIN, MVT::v4i32, 2}, // The data reported by the IACA is
>> "1.8"
>> +      {ISD::SMIN, MVT::v8i16, 1}, // The data reported by the IACA is
>> "1.5"
>> +      {ISD::UMIN, MVT::v8i16, 2}, // The data reported by the IACA is
>> "1.8"
>> +  };
>> +
>> +  static const CostTblEntry AVX1CostTblNoPairWise[] = {
>> +      {ISD::FMINNUM, MVT::v4f32, 1},
>> +      {ISD::FMINNUM, MVT::v4f64, 1},
>> +      {ISD::FMINNUM, MVT::v8f32, 1},
>> +      {ISD::SMIN, MVT::v2i64, 3},
>> +      {ISD::UMIN, MVT::v2i64, 3},
>> +      {ISD::SMIN, MVT::v4i32, 1},
>> +      {ISD::UMIN, MVT::v4i32, 1},
>> +      {ISD::SMIN, MVT::v8i16, 1},
>> +      {ISD::UMIN, MVT::v8i16, 1},
>> +      {ISD::SMIN, MVT::v8i32, 2},
>> +      {ISD::UMIN, MVT::v8i32, 2},
>> +  };
>> +
>> +  static const CostTblEntry AVX2CostTblNoPairWise[] = {
>> +      {ISD::SMIN, MVT::v4i64, 1},
>> +      {ISD::UMIN, MVT::v4i64, 1},
>> +      {ISD::SMIN, MVT::v8i32, 1},
>> +      {ISD::UMIN, MVT::v8i32, 1},
>> +      {ISD::SMIN, MVT::v16i16, 1},
>> +      {ISD::UMIN, MVT::v16i16, 1},
>> +      {ISD::SMIN, MVT::v32i8, 1},
>> +      {ISD::UMIN, MVT::v32i8, 1},
>> +  };
>> +
>> +  static const CostTblEntry AVX512CostTblNoPairWise[] = {
>> +      {ISD::FMINNUM, MVT::v8f64, 1},
>> +      {ISD::FMINNUM, MVT::v16f32, 2},
>> +      {ISD::SMIN, MVT::v8i64, 1},
>> +      {ISD::UMIN, MVT::v8i64, 1},
>> +      {ISD::SMIN, MVT::v16i32, 1},
>> +      {ISD::UMIN, MVT::v16i32, 1},
>> +  };
>> +
>> +  if (IsPairwise) {
>> +    if (ST->hasAVX512())
>> +      if (const auto *Entry = CostTableLookup(AVX512CostTblPairWise,
>> ISD, MTy))
>> +        return LT.first * Entry->Cost;
>> +
>> +    if (ST->hasAVX2())
>> +      if (const auto *Entry = CostTableLookup(AVX2CostTblPairWise, ISD,
>> MTy))
>> +        return LT.first * Entry->Cost;
>> +
>> +    if (ST->hasAVX())
>> +      if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD,
>> MTy))
>> +        return LT.first * Entry->Cost;
>> +
>> +    if (ST->hasSSE42())
>> +      if (const auto *Entry = CostTableLookup(SSE42CostTblPairWise, ISD,
>> MTy))
>> +        return LT.first * Entry->Cost;
>> +  } else {
>> +    if (ST->hasAVX512())
>> +      if (const auto *Entry =
>> +              CostTableLookup(AVX512CostTblNoPairWise, ISD, MTy))
>> +        return LT.first * Entry->Cost;
>> +
>> +    if (ST->hasAVX2())
>> +      if (const auto *Entry = CostTableLookup(AVX2CostTblNoPairWise,
>> ISD, MTy))
>> +        return LT.first * Entry->Cost;
>> +
>> +    if (ST->hasAVX())
>> +      if (const auto *Entry = CostTableLookup(AVX1CostTblNoPairWise,
>> ISD, MTy))
>> +        return LT.first * Entry->Cost;
>> +
>> +    if (ST->hasSSE42())
>> +      if (const auto *Entry = CostTableLookup(SSE42CostTblNoPairWise,
>> ISD, MTy))
>> +        return LT.first * Entry->Cost;
>> +  }
>> +
>> +  return BaseT::getMinMaxReductionCost(ValTy, CondTy, IsPairwise,
>> IsUnsigned);
>> +}
>> +
>>  /// \brief Calculate the cost of materializing a 64-bit value. This
>> helper
>>  /// method might only calculate a fraction of a larger immediate.
>> Therefore it
>>  /// is valid to return a cost of ZERO.
>>
>> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h (original)
>> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h Fri Sep  8
>> 06:49:36 2017
>> @@ -96,6 +96,9 @@ public:
>>    int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
>>                                   bool IsPairwiseForm);
>>
>> +  int getMinMaxReductionCost(Type *Ty, Type *CondTy, bool IsPairwiseForm,
>> +                             bool IsUnsigned);
>> +
>>    int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
>>                                   unsigned Factor, ArrayRef<unsigned>
>> Indices,
>>                                   unsigned Alignment, unsigned
>> AddressSpace);
>>
>> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)
>> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Sep  8
>> 06:49:36 2017
>> @@ -4627,11 +4627,17 @@ class HorizontalReduction {
>>    // Use map vector to make stable output.
>>    MapVector<Instruction *, Value *> ExtraArgs;
>>
>> +  /// Kind of the reduction data.
>> +  enum ReductionKind {
>> +    RK_None,       /// Not a reduction.
>> +    RK_Arithmetic, /// Binary reduction data.
>> +    RK_Min,        /// Minimum reduction data.
>> +    RK_UMin,       /// Unsigned minimum reduction data.
>> +    RK_Max,        /// Maximum reduction data.
>> +    RK_UMax,       /// Unsigned maximum reduction data.
>> +  };
>>    /// Contains info about operation, like its opcode, left and right
>> operands.
>> -  struct OperationData {
>> -    /// true if the operation is a reduced value, false if reduction
>> operation.
>> -    bool IsReducedValue = false;
>> -
>> +  class OperationData {
>>      /// Opcode of the instruction.
>>      unsigned Opcode = 0;
>>
>> @@ -4640,12 +4646,21 @@ class HorizontalReduction {
>>
>>      /// Right operand of the reduction operation.
>>      Value *RHS = nullptr;
>> +    /// Kind of the reduction operation.
>> +    ReductionKind Kind = RK_None;
>> +    /// True if float point min/max reduction has no NaNs.
>> +    bool NoNaN = false;
>>
>>      /// Checks if the reduction operation can be vectorized.
>>      bool isVectorizable() const {
>>        return LHS && RHS &&
>> -             // We currently only support adds.
>> -             (Opcode == Instruction::Add || Opcode == Instruction::FAdd);
>> +             // We currently only support adds && min/max reductions.
>> +             ((Kind == RK_Arithmetic &&
>> +               (Opcode == Instruction::Add || Opcode ==
>> Instruction::FAdd)) ||
>> +              ((Opcode == Instruction::ICmp || Opcode ==
>> Instruction::FCmp) &&
>> +               (Kind == RK_Min || Kind == RK_Max)) ||
>> +              (Opcode == Instruction::ICmp &&
>> +               (Kind == RK_UMin || Kind == RK_UMax)));
>>      }
>>
>>    public:
>> @@ -4653,43 +4668,90 @@ class HorizontalReduction {
>>
>>      /// Construction for reduced values. They are identified by opcode
>> only and
>>      /// don't have associated LHS/RHS values.
>> -    explicit OperationData(Value *V) : IsReducedValue(true) {
>> +    explicit OperationData(Value *V) : Kind(RK_None) {
>>        if (auto *I = dyn_cast<Instruction>(V))
>>          Opcode = I->getOpcode();
>>      }
>>
>> -    /// Constructor for binary reduction operations with opcode and its
>> left and
>> +    /// Constructor for reduction operations with opcode and its left and
>>      /// right operands.
>> -    OperationData(unsigned Opcode, Value *LHS, Value *RHS)
>> -        : Opcode(Opcode), LHS(LHS), RHS(RHS) {}
>> -
>> +    OperationData(unsigned Opcode, Value *LHS, Value *RHS, ReductionKind
>> Kind,
>> +                  bool NoNaN = false)
>> +        : Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind), NoNaN(NoNaN) {
>> +      assert(Kind != RK_None && "One of the reduction operations is
>> expected.");
>> +    }
>>      explicit operator bool() const { return Opcode; }
>>
>>      /// Get the index of the first operand.
>>      unsigned getFirstOperandIndex() const {
>>        assert(!!*this && "The opcode is not set.");
>> +      switch (Kind) {
>> +      case RK_Min:
>> +      case RK_UMin:
>> +      case RK_Max:
>> +      case RK_UMax:
>> +        return 1;
>> +      case RK_Arithmetic:
>> +      case RK_None:
>> +        break;
>> +      }
>>        return 0;
>>      }
>>
>>      /// Total number of operands in the reduction operation.
>>      unsigned getNumberOfOperands() const {
>> -      assert(!IsReducedValue && !!*this && LHS && RHS &&
>> +      assert(Kind != RK_None && !!*this && LHS && RHS &&
>>               "Expected reduction operation.");
>> -      return 2;
>> +      switch (Kind) {
>> +      case RK_Arithmetic:
>> +        return 2;
>> +      case RK_Min:
>> +      case RK_UMin:
>> +      case RK_Max:
>> +      case RK_UMax:
>> +        return 3;
>> +      case RK_None:
>> +        llvm_unreachable("Reduction kind is not set");
>> +      }
>>      }
>>
>>      /// Expected number of uses for reduction operations/reduced values.
>>      unsigned getRequiredNumberOfUses() const {
>> -      assert(!IsReducedValue && !!*this && LHS && RHS &&
>> +      assert(Kind != RK_None && !!*this && LHS && RHS &&
>>               "Expected reduction operation.");
>> -      return 1;
>> +      switch (Kind) {
>> +      case RK_Arithmetic:
>> +        return 1;
>> +      case RK_Min:
>> +      case RK_UMin:
>> +      case RK_Max:
>> +      case RK_UMax:
>> +        return 2;
>> +      case RK_None:
>> +        llvm_unreachable("Reduction kind is not set");
>> +      }
>>      }
>>
>>      /// Checks if instruction is associative and can be vectorized.
>>      bool isAssociative(Instruction *I) const {
>> -      assert(!IsReducedValue && *this && LHS && RHS &&
>> +      assert(Kind != RK_None && *this && LHS && RHS &&
>>               "Expected reduction operation.");
>> -      return I->isAssociative();
>> +      switch (Kind) {
>> +      case RK_Arithmetic:
>> +        return I->isAssociative();
>> +      case RK_Min:
>> +      case RK_Max:
>> +        return Opcode == Instruction::ICmp ||
>> +               cast<Instruction>(I->getOperand(0))->hasUnsafeAlgebra();
>> +      case RK_UMin:
>> +      case RK_UMax:
>> +        assert(Opcode == Instruction::ICmp &&
>> +               "Only integer compare operation is expected.");
>> +        return true;
>> +      case RK_None:
>> +        break;
>> +      }
>> +      llvm_unreachable("Reduction kind is not set");
>>      }
>>
>>      /// Checks if the reduction operation can be vectorized.
>> @@ -4700,18 +4762,17 @@ class HorizontalReduction {
>>      /// Checks if two operation data are both a reduction op or both a
>> reduced
>>      /// value.
>>      bool operator==(const OperationData &OD) {
>> -      assert(((IsReducedValue != OD.IsReducedValue) ||
>> -              ((!LHS == !OD.LHS) && (!RHS == !OD.RHS))) &&
>> +      assert(((Kind != OD.Kind) || ((!LHS == !OD.LHS) && (!RHS ==
>> !OD.RHS))) &&
>>               "One of the comparing operations is incorrect.");
>> -      return this == &OD ||
>> -             (IsReducedValue == OD.IsReducedValue && Opcode ==
>> OD.Opcode);
>> +      return this == &OD || (Kind == OD.Kind && Opcode == OD.Opcode);
>>      }
>>      bool operator!=(const OperationData &OD) { return !(*this == OD); }
>>      void clear() {
>> -      IsReducedValue = false;
>>        Opcode = 0;
>>        LHS = nullptr;
>>        RHS = nullptr;
>> +      Kind = RK_None;
>> +      NoNaN = false;
>>      }
>>
>>      /// Get the opcode of the reduction operation.
>> @@ -4720,16 +4781,81 @@ class HorizontalReduction {
>>        return Opcode;
>>      }
>>
>> +    /// Get kind of reduction data.
>> +    ReductionKind getKind() const { return Kind; }
>>      Value *getLHS() const { return LHS; }
>>      Value *getRHS() const { return RHS; }
>> +    Type *getConditionType() const {
>> +      switch (Kind) {
>> +      case RK_Arithmetic:
>> +        return nullptr;
>> +      case RK_Min:
>> +      case RK_Max:
>> +      case RK_UMin:
>> +      case RK_UMax:
>> +        return CmpInst::makeCmpResultType(LHS->getType());
>> +      case RK_None:
>> +        break;
>> +      }
>> +      llvm_unreachable("Reduction kind is not set");
>> +    }
>>
>>      /// Creates reduction operation with the current opcode.
>>      Value *createOp(IRBuilder<> &Builder, const Twine &Name = "") const {
>> -      assert(!IsReducedValue &&
>> -             (Opcode == Instruction::FAdd || Opcode == Instruction::Add)
>> &&
>> -             "Expected add|fadd reduction operation.");
>> -      return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS,
>> RHS,
>> -                                 Name);
>> +      assert(isVectorizable() &&
>> +             "Expected add|fadd or min/max reduction operation.");
>> +      Value *Cmp;
>> +      switch (Kind) {
>> +      case RK_Arithmetic:
>> +        return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS,
>> RHS,
>> +                                   Name);
>> +      case RK_Min:
>> +        Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSLT(LHS,
>> RHS)
>> +                                          : Builder.CreateFCmpOLT(LHS,
>> RHS);
>> +        break;
>> +      case RK_Max:
>> +        Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSGT(LHS,
>> RHS)
>> +                                          : Builder.CreateFCmpOGT(LHS,
>> RHS);
>> +        break;
>> +      case RK_UMin:
>> +        assert(Opcode == Instruction::ICmp && "Expected integer types.");
>> +        Cmp = Builder.CreateICmpULT(LHS, RHS);
>> +        break;
>> +      case RK_UMax:
>> +        assert(Opcode == Instruction::ICmp && "Expected integer types.");
>> +        Cmp = Builder.CreateICmpUGT(LHS, RHS);
>> +        break;
>> +      case RK_None:
>> +        llvm_unreachable("Unknown reduction operation.");
>> +      }
>> +      return Builder.CreateSelect(Cmp, LHS, RHS, Name);
>> +    }
>> +    TargetTransformInfo::ReductionFlags getFlags() const {
>> +      TargetTransformInfo::ReductionFlags Flags;
>> +      Flags.NoNaN = NoNaN;
>> +      switch (Kind) {
>> +      case RK_Arithmetic:
>> +        break;
>> +      case RK_Min:
>> +        Flags.IsSigned = Opcode == Instruction::ICmp;
>> +        Flags.IsMaxOp = false;
>> +        break;
>> +      case RK_Max:
>> +        Flags.IsSigned = Opcode == Instruction::ICmp;
>> +        Flags.IsMaxOp = true;
>> +        break;
>> +      case RK_UMin:
>> +        Flags.IsSigned = false;
>> +        Flags.IsMaxOp = false;
>> +        break;
>> +      case RK_UMax:
>> +        Flags.IsSigned = false;
>> +        Flags.IsMaxOp = true;
>> +        break;
>> +      case RK_None:
>> +        llvm_unreachable("Reduction kind is not set");
>> +      }
>> +      return Flags;
>>      }
>>    };
>>
>> @@ -4771,8 +4897,32 @@ class HorizontalReduction {
>>
>>      Value *LHS;
>>      Value *RHS;
>> -    if (m_BinOp(m_Value(LHS), m_Value(RHS)).match(V))
>> -      return OperationData(cast<BinaryOperator>(V)->getOpcode(), LHS,
>> RHS);
>> +    if (m_BinOp(m_Value(LHS), m_Value(RHS)).match(V)) {
>> +      return OperationData(cast<BinaryOperator>(V)->getOpcode(), LHS,
>> RHS,
>> +                           RK_Arithmetic);
>> +    }
>> +    if (auto *Select = dyn_cast<SelectInst>(V)) {
>> +      // Look for a min/max pattern.
>> +      if (m_UMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> +        return OperationData(Instruction::ICmp, LHS, RHS, RK_UMin);
>> +      } else if (m_SMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> +        return OperationData(Instruction::ICmp, LHS, RHS, RK_Min);
>> +      } else if (m_OrdFMin(m_Value(LHS), m_Value(RHS)).match(Select) ||
>> +                 m_UnordFMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> +        return OperationData(
>> +            Instruction::FCmp, LHS, RHS, RK_Min,
>> +            cast<Instruction>(Select->getCondition())->hasNoNaNs());
>> +      } else if (m_UMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> +        return OperationData(Instruction::ICmp, LHS, RHS, RK_UMax);
>> +      } else if (m_SMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> +        return OperationData(Instruction::ICmp, LHS, RHS, RK_Max);
>> +      } else if (m_OrdFMax(m_Value(LHS), m_Value(RHS)).match(Select) ||
>> +                 m_UnordFMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> +        return OperationData(
>> +            Instruction::FCmp, LHS, RHS, RK_Max,
>> +            cast<Instruction>(Select->getCondition())->hasNoNaNs());
>> +      }
>> +    }
>>      return OperationData(V);
>>    }
>>
>> @@ -4965,8 +5115,9 @@ public:
>>        if (VectorizedTree) {
>>          Builder.SetCurrentDebugLocation(Loc);
>>          OperationData VectReductionData(ReductionData.getOpcode(),
>> -                                        VectorizedTree, ReducedSubTree);
>> -        VectorizedTree = VectReductionData.createOp(Builder, "bin.rdx");
>> +                                        VectorizedTree, ReducedSubTree,
>> +                                        ReductionData.getKind());
>> +        VectorizedTree = VectReductionData.createOp(Builder, "op.rdx");
>>          propagateIRFlags(VectorizedTree, ReductionOps);
>>        } else
>>          VectorizedTree = ReducedSubTree;
>> @@ -4980,7 +5131,8 @@ public:
>>          auto *I = cast<Instruction>(ReducedVals[i]);
>>          Builder.SetCurrentDebugLocation(I->getDebugLoc());
>>          OperationData VectReductionData(ReductionData.getOpcode(),
>> -                                        VectorizedTree, I);
>> +                                        VectorizedTree, I,
>> +                                        ReductionData.getKind());
>>          VectorizedTree = VectReductionData.createOp(Builder);
>>          propagateIRFlags(VectorizedTree, ReductionOps);
>>        }
>> @@ -4991,8 +5143,9 @@ public:
>>          for (auto *I : Pair.second) {
>>            Builder.SetCurrentDebugLocation(I->getDebugLoc());
>>            OperationData VectReductionData(ReductionData.getOpcode(),
>> -                                          VectorizedTree, Pair.first);
>> -          VectorizedTree = VectReductionData.createOp(Builder,
>> "bin.extra");
>> +                                          VectorizedTree, Pair.first,
>> +                                          ReductionData.getKind());
>> +          VectorizedTree = VectReductionData.createOp(Builder,
>> "op.extra");
>>            propagateIRFlags(VectorizedTree, I);
>>          }
>>        }
>> @@ -5013,19 +5166,58 @@ private:
>>      Type *ScalarTy = FirstReducedVal->getType();
>>      Type *VecTy = VectorType::get(ScalarTy, ReduxWidth);
>>
>> -    int PairwiseRdxCost =
>> -        TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,
>> -                                        /*IsPairwiseForm=*/true);
>> -    int SplittingRdxCost =
>> -        TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,
>> -                                        /*IsPairwiseForm=*/false);
>> +    int PairwiseRdxCost;
>> +    int SplittingRdxCost;
>> +    bool IsUnsigned = true;
>> +    switch (ReductionData.getKind()) {
>> +    case RK_Arithmetic:
>> +      PairwiseRdxCost =
>> +          TTI->getArithmeticReductionCost(ReductionData.getOpcode(),
>> VecTy,
>> +                                          /*IsPairwiseForm=*/true);
>> +      SplittingRdxCost =
>> +          TTI->getArithmeticReductionCost(ReductionData.getOpcode(),
>> VecTy,
>> +                                          /*IsPairwiseForm=*/false);
>> +      break;
>> +    case RK_Min:
>> +    case RK_Max:
>> +      IsUnsigned = false;
>> +    case RK_UMin:
>> +    case RK_UMax: {
>> +      Type *VecCondTy = CmpInst::makeCmpResultType(VecTy);
>> +      PairwiseRdxCost =
>> +          TTI->getMinMaxReductionCost(VecTy, VecCondTy,
>> +                                      /*IsPairwiseForm=*/true,
>> IsUnsigned);
>> +      SplittingRdxCost =
>> +          TTI->getMinMaxReductionCost(VecTy, VecCondTy,
>> +                                      /*IsPairwiseForm=*/false,
>> IsUnsigned);
>> +      break;
>> +    }
>> +    case RK_None:
>> +      llvm_unreachable("Expected arithmetic or min/max reduction
>> operation");
>> +    }
>>
>>      IsPairwiseReduction = PairwiseRdxCost < SplittingRdxCost;
>>      int VecReduxCost = IsPairwiseReduction ? PairwiseRdxCost :
>> SplittingRdxCost;
>>
>> -    int ScalarReduxCost =
>> -        (ReduxWidth - 1) *
>> -        TTI->getArithmeticInstrCost(ReductionData.getOpcode(), ScalarTy);
>> +    int ScalarReduxCost;
>> +    switch (ReductionData.getKind()) {
>> +    case RK_Arithmetic:
>> +      ScalarReduxCost =
>> +          TTI->getArithmeticInstrCost(ReductionData.getOpcode(),
>> ScalarTy);
>> +      break;
>> +    case RK_Min:
>> +    case RK_Max:
>> +    case RK_UMin:
>> +    case RK_UMax:
>> +      ScalarReduxCost =
>> +          TTI->getCmpSelInstrCost(ReductionData.getOpcode(), ScalarTy) +
>> +          TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
>> +                                  CmpInst::makeCmpResultType(ScalarTy));
>> +      break;
>> +    case RK_None:
>> +      llvm_unreachable("Expected arithmetic or min/max reduction
>> operation");
>> +    }
>> +    ScalarReduxCost *= (ReduxWidth - 1);
>>
>>      DEBUG(dbgs() << "SLP: Adding cost " << VecReduxCost - ScalarReduxCost
>>                   << " for reduction that starts with " <<
>> *FirstReducedVal
>> @@ -5047,7 +5239,7 @@ private:
>>      if (!IsPairwiseReduction)
>>        return createSimpleTargetReduction(
>>            Builder, TTI, ReductionData.getOpcode(), VectorizedValue,
>> -          TargetTransformInfo::ReductionFlags(), RedOps);
>> +          ReductionData.getFlags(), RedOps);
>>
>>      Value *TmpVec = VectorizedValue;
>>      for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
>> @@ -5062,8 +5254,8 @@ private:
>>            TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
>>            "rdx.shuf.r");
>>        OperationData VectReductionData(ReductionData.getOpcode(),
>> LeftShuf,
>> -                                      RightShuf);
>> -      TmpVec = VectReductionData.createOp(Builder, "bin.rdx");
>> +                                      RightShuf,
>> ReductionData.getKind());
>> +      TmpVec = VectReductionData.createOp(Builder, "op.rdx");
>>        propagateIRFlags(TmpVec, RedOps);
>>      }
>>
>> @@ -5224,9 +5416,11 @@ static bool tryToVectorizeHorReductionOr
>>      auto *Inst = dyn_cast<Instruction>(V);
>>      if (!Inst)
>>        continue;
>> -    if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {
>> +    auto *BI = dyn_cast<BinaryOperator>(Inst);
>> +    auto *SI = dyn_cast<SelectInst>(Inst);
>> +    if (BI || SI) {
>>        HorizontalReduction HorRdx;
>> -      if (HorRdx.matchAssociativeReduction(P, BI)) {
>> +      if (HorRdx.matchAssociativeReduction(P, Inst)) {
>>          if (HorRdx.tryToReduce(R, TTI)) {
>>            Res = true;
>>            // Set P to nullptr to avoid re-analysis of phi node in
>> @@ -5235,7 +5429,7 @@ static bool tryToVectorizeHorReductionOr
>>            continue;
>>          }
>>        }
>> -      if (P) {
>> +      if (P && BI) {
>>          Inst = dyn_cast<Instruction>(BI->getOperand(0));
>>          if (Inst == P)
>>            Inst = dyn_cast<Instruction>(BI->getOperand(1));
>>
>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
>> (original)
>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll Fri
>> Sep  8 06:49:36 2017
>> @@ -117,11 +117,11 @@ define float @bazz() {
>>  ; CHECK-NEXT:    [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
>> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>>  ; CHECK-NEXT:    [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]],
>> [[RDX_SHUF3]]
>>  ; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]],
>> i32 0
>> -; CHECK-NEXT:    [[BIN_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
>> -; CHECK-NEXT:    [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
>> [[CONV6]]
>> +; CHECK-NEXT:    [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
>> +; CHECK-NEXT:    [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
>> [[CONV6]]
>>  ; CHECK-NEXT:    [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
>> -; CHECK-NEXT:    store float [[BIN_EXTRA5]], float* @res, align 4
>> -; CHECK-NEXT:    ret float [[BIN_EXTRA5]]
>> +; CHECK-NEXT:    store float [[OP_EXTRA5]], float* @res, align 4
>> +; CHECK-NEXT:    ret float [[OP_EXTRA5]]
>>  ;
>>  ; THRESHOLD-LABEL: @bazz(
>>  ; THRESHOLD-NEXT:  entry:
>> @@ -148,11 +148,11 @@ define float @bazz() {
>>  ; THRESHOLD-NEXT:    [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
>> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>>  ; THRESHOLD-NEXT:    [[BIN_RDX4:%.*]] = fadd fast <8 x float>
>> [[BIN_RDX2]], [[RDX_SHUF3]]
>>  ; THRESHOLD-NEXT:    [[TMP4:%.*]] = extractelement <8 x float>
>> [[BIN_RDX4]], i32 0
>> -; THRESHOLD-NEXT:    [[BIN_EXTRA:%.*]] = fadd fast float [[TMP4]],
>> [[CONV]]
>> -; THRESHOLD-NEXT:    [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
>> [[CONV6]]
>> +; THRESHOLD-NEXT:    [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]],
>> [[CONV]]
>> +; THRESHOLD-NEXT:    [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
>> [[CONV6]]
>>  ; THRESHOLD-NEXT:    [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
>> -; THRESHOLD-NEXT:    store float [[BIN_EXTRA5]], float* @res, align 4
>> -; THRESHOLD-NEXT:    ret float [[BIN_EXTRA5]]
>> +; THRESHOLD-NEXT:    store float [[OP_EXTRA5]], float* @res, align 4
>> +; THRESHOLD-NEXT:    ret float [[OP_EXTRA5]]
>>  ;
>>  entry:
>>    %0 = load i32, i32* @n, align 4
>> @@ -327,47 +327,53 @@ entry:
>>  define float @bar() {
>>  ; CHECK-LABEL: @bar(
>>  ; CHECK-NEXT:  entry:
>> -; CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, <2 x float>* bitcast
>> ([20 x float]* @arr to <2 x float>*), align 16
>> -; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast
>> ([20 x float]* @arr1 to <2 x float>*), align 16
>> -; CHECK-NEXT:    [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP0]]
>> -; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32
>> 0
>> -; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32
>> 1
>> +; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, <4 x float>* bitcast
>> ([20 x float]* @arr to <4 x float>*), align 16
>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast
>> ([20 x float]* @arr1 to <4 x float>*), align 16
>> +; CHECK-NEXT:    [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
>> +; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32
>> 0
>> +; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32
>> 1
>>  ; CHECK-NEXT:    [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]], [[TMP4]]
>> -; CHECK-NEXT:    [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float
>> [[TMP3]], float [[TMP4]]
>> -; CHECK-NEXT:    [[TMP5:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
>> -; CHECK-NEXT:    [[TMP6:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
>> -; CHECK-NEXT:    [[MUL3_1:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
>> -; CHECK-NEXT:    [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]],
>> [[MUL3_1]]
>> -; CHECK-NEXT:    [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
>> [[MAX_0_MUL3]], float [[MUL3_1]]
>> -; CHECK-NEXT:    [[TMP7:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
>> -; CHECK-NEXT:    [[TMP8:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
>> -; CHECK-NEXT:    [[MUL3_2:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
>> -; CHECK-NEXT:    [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]],
>> [[MUL3_2]]
>> -; CHECK-NEXT:    [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
>> [[MAX_0_MUL3_1]], float [[MUL3_2]]
>> -; CHECK-NEXT:    store float [[MAX_0_MUL3_2]], float* @res, align 4
>> -; CHECK-NEXT:    ret float [[MAX_0_MUL3_2]]
>> +; CHECK-NEXT:    [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float undef,
>> float undef
>> +; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32
>> 2
>> +; CHECK-NEXT:    [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]],
>> [[TMP5]]
>> +; CHECK-NEXT:    [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
>> [[MAX_0_MUL3]], float undef
>> +; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32
>> 3
>> +; CHECK-NEXT:    [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]],
>> [[TMP6]]
>> +; CHECK-NEXT:    [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]],
>> <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> +; CHECK-NEXT:    [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float>
>> [[TMP2]], [[RDX_SHUF]]
>> +; CHECK-NEXT:    [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1>
>> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
>> +; CHECK-NEXT:    [[RDX_SHUF1:%.*]] = shufflevector <4 x float>
>> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32
>> undef, i32 undef>
>> +; CHECK-NEXT:    [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float>
>> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
>> +; CHECK-NEXT:    [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1>
>> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float>
>> [[RDX_SHUF1]]
>> +; CHECK-NEXT:    [[TMP7:%.*]] = extractelement <4 x float>
>> [[RDX_MINMAX_SELECT3]], i32 0
>> +; CHECK-NEXT:    [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
>> [[MAX_0_MUL3_1]], float undef
>> +; CHECK-NEXT:    store float [[TMP7]], float* @res, align 4
>> +; CHECK-NEXT:    ret float [[TMP7]]
>>  ;
>>  ; THRESHOLD-LABEL: @bar(
>>  ; THRESHOLD-NEXT:  entry:
>> -; THRESHOLD-NEXT:    [[TMP0:%.*]] = load <2 x float>, <2 x float>*
>> bitcast ([20 x float]* @arr to <2 x float>*), align 16
>> -; THRESHOLD-NEXT:    [[TMP1:%.*]] = load <2 x float>, <2 x float>*
>> bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
>> -; THRESHOLD-NEXT:    [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]],
>> [[TMP0]]
>> -; THRESHOLD-NEXT:    [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]],
>> i32 0
>> -; THRESHOLD-NEXT:    [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]],
>> i32 1
>> +; THRESHOLD-NEXT:    [[TMP0:%.*]] = load <4 x float>, <4 x float>*
>> bitcast ([20 x float]* @arr to <4 x float>*), align 16
>> +; THRESHOLD-NEXT:    [[TMP1:%.*]] = load <4 x float>, <4 x float>*
>> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
>> +; THRESHOLD-NEXT:    [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]],
>> [[TMP0]]
>> +; THRESHOLD-NEXT:    [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]],
>> i32 0
>> +; THRESHOLD-NEXT:    [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]],
>> i32 1
>>  ; THRESHOLD-NEXT:    [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]],
>> [[TMP4]]
>> -; THRESHOLD-NEXT:    [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float
>> [[TMP3]], float [[TMP4]]
>> -; THRESHOLD-NEXT:    [[TMP5:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
>> -; THRESHOLD-NEXT:    [[TMP6:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
>> -; THRESHOLD-NEXT:    [[MUL3_1:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
>> -; THRESHOLD-NEXT:    [[CMP4_1:%.*]] = fcmp fast ogt float
>> [[MAX_0_MUL3]], [[MUL3_1]]
>> -; THRESHOLD-NEXT:    [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
>> [[MAX_0_MUL3]], float [[MUL3_1]]
>> -; THRESHOLD-NEXT:    [[TMP7:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
>> -; THRESHOLD-NEXT:    [[TMP8:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
>> -; THRESHOLD-NEXT:    [[MUL3_2:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
>> -; THRESHOLD-NEXT:    [[CMP4_2:%.*]] = fcmp fast ogt float
>> [[MAX_0_MUL3_1]], [[MUL3_2]]
>> -; THRESHOLD-NEXT:    [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
>> [[MAX_0_MUL3_1]], float [[MUL3_2]]
>> -; THRESHOLD-NEXT:    store float [[MAX_0_MUL3_2]], float* @res, align 4
>> -; THRESHOLD-NEXT:    ret float [[MAX_0_MUL3_2]]
>> +; THRESHOLD-NEXT:    [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float
>> undef, float undef
>> +; THRESHOLD-NEXT:    [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]],
>> i32 2
>> +; THRESHOLD-NEXT:    [[CMP4_1:%.*]] = fcmp fast ogt float
>> [[MAX_0_MUL3]], [[TMP5]]
>> +; THRESHOLD-NEXT:    [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
>> [[MAX_0_MUL3]], float undef
>> +; THRESHOLD-NEXT:    [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]],
>> i32 3
>> +; THRESHOLD-NEXT:    [[CMP4_2:%.*]] = fcmp fast ogt float
>> [[MAX_0_MUL3_1]], [[TMP6]]
>> +; THRESHOLD-NEXT:    [[RDX_SHUF:%.*]] = shufflevector <4 x float>
>> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> +; THRESHOLD-NEXT:    [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float>
>> [[TMP2]], [[RDX_SHUF]]
>> +; THRESHOLD-NEXT:    [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1>
>> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
>> +; THRESHOLD-NEXT:    [[RDX_SHUF1:%.*]] = shufflevector <4 x float>
>> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32
>> undef, i32 undef>
>> +; THRESHOLD-NEXT:    [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float>
>> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
>> +; THRESHOLD-NEXT:    [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1>
>> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float>
>> [[RDX_SHUF1]]
>> +; THRESHOLD-NEXT:    [[TMP7:%.*]] = extractelement <4 x float>
>> [[RDX_MINMAX_SELECT3]], i32 0
>> +; THRESHOLD-NEXT:    [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
>> [[MAX_0_MUL3_1]], float undef
>> +; THRESHOLD-NEXT:    store float [[TMP7]], float* @res, align 4
>> +; THRESHOLD-NEXT:    ret float [[TMP7]]
>>  ;
>>  entry:
>>    %0 = load float, float* getelementptr inbounds ([20 x float], [20 x
>> float]* @arr, i64 0, i64 0), align 16
>> @@ -512,9 +518,9 @@ define float @f(float* nocapture readonl
>>  ; CHECK-NEXT:    [[RDX_SHUF15:%.*]] = shufflevector <16 x float>
>> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>>  ; CHECK-NEXT:    [[BIN_RDX16:%.*]] = fadd fast <16 x float>
>> [[BIN_RDX14]], [[RDX_SHUF15]]
>>  ; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <16 x float>
>> [[BIN_RDX16]], i32 0
>> -; CHECK-NEXT:    [[BIN_RDX17:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
>> +; CHECK-NEXT:    [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
>>  ; CHECK-NEXT:    [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
>> -; CHECK-NEXT:    ret float [[BIN_RDX17]]
>> +; CHECK-NEXT:    ret float [[OP_RDX]]
>>  ;
>>  ; THRESHOLD-LABEL: @f(
>>  ; THRESHOLD-NEXT:  entry:
>> @@ -635,9 +641,9 @@ define float @f(float* nocapture readonl
>>  ; THRESHOLD-NEXT:    [[RDX_SHUF15:%.*]] = shufflevector <16 x float>
>> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>>  ; THRESHOLD-NEXT:    [[BIN_RDX16:%.*]] = fadd fast <16 x float>
>> [[BIN_RDX14]], [[RDX_SHUF15]]
>>  ; THRESHOLD-NEXT:    [[TMP5:%.*]] = extractelement <16 x float>
>> [[BIN_RDX16]], i32 0
>> -; THRESHOLD-NEXT:    [[BIN_RDX17:%.*]] = fadd fast float [[TMP4]],
>> [[TMP5]]
>> +; THRESHOLD-NEXT:    [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
>>  ; THRESHOLD-NEXT:    [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
>> -; THRESHOLD-NEXT:    ret float [[BIN_RDX17]]
>> +; THRESHOLD-NEXT:    ret float [[OP_RDX]]
>>  ;
>>    entry:
>>    %0 = load float, float* %x, align 4
>> @@ -865,9 +871,9 @@ define float @f1(float* nocapture readon
>>  ; CHECK-NEXT:    [[RDX_SHUF7:%.*]] = shufflevector <32 x float>
>> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef>
>>  ; CHECK-NEXT:    [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]],
>> [[RDX_SHUF7]]
>>  ; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <32 x float>
>> [[BIN_RDX8]], i32 0
>> -; CHECK-NEXT:    [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
>> +; CHECK-NEXT:    [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
>>  ; CHECK-NEXT:    [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
>> -; CHECK-NEXT:    ret float [[BIN_EXTRA]]
>> +; CHECK-NEXT:    ret float [[OP_EXTRA]]
>>  ;
>>  ; THRESHOLD-LABEL: @f1(
>>  ; THRESHOLD-NEXT:  entry:
>> @@ -948,9 +954,9 @@ define float @f1(float* nocapture readon
>>  ; THRESHOLD-NEXT:    [[RDX_SHUF7:%.*]] = shufflevector <32 x float>
>> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef>
>>  ; THRESHOLD-NEXT:    [[BIN_RDX8:%.*]] = fadd fast <32 x float>
>> [[BIN_RDX6]], [[RDX_SHUF7]]
>>  ; THRESHOLD-NEXT:    [[TMP2:%.*]] = extractelement <32 x float>
>> [[BIN_RDX8]], i32 0
>> -; THRESHOLD-NEXT:    [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]],
>> [[CONV]]
>> +; THRESHOLD-NEXT:    [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]],
>> [[CONV]]
>>  ; THRESHOLD-NEXT:    [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
>> -; THRESHOLD-NEXT:    ret float [[BIN_EXTRA]]
>> +; THRESHOLD-NEXT:    ret float [[OP_EXTRA]]
>>  ;
>>    entry:
>>    %rem = srem i32 %a, %b
>> @@ -1138,14 +1144,14 @@ define float @loadadd31(float* nocapture
>>  ; CHECK-NEXT:    [[RDX_SHUF11:%.*]] = shufflevector <8 x float>
>> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>>  ; CHECK-NEXT:    [[BIN_RDX12:%.*]] = fadd fast <8 x float>
>> [[BIN_RDX10]], [[RDX_SHUF11]]
>>  ; CHECK-NEXT:    [[TMP9:%.*]] = extractelement <8 x float>
>> [[BIN_RDX12]], i32 0
>> -; CHECK-NEXT:    [[BIN_RDX13:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
>> -; CHECK-NEXT:    [[RDX_SHUF14:%.*]] = shufflevector <4 x float>
>> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> -; CHECK-NEXT:    [[BIN_RDX15:%.*]] = fadd fast <4 x float> [[TMP3]],
>> [[RDX_SHUF14]]
>> -; CHECK-NEXT:    [[RDX_SHUF16:%.*]] = shufflevector <4 x float>
>> [[BIN_RDX15]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef>
>> -; CHECK-NEXT:    [[BIN_RDX17:%.*]] = fadd fast <4 x float>
>> [[BIN_RDX15]], [[RDX_SHUF16]]
>> -; CHECK-NEXT:    [[TMP10:%.*]] = extractelement <4 x float>
>> [[BIN_RDX17]], i32 0
>> -; CHECK-NEXT:    [[BIN_RDX18:%.*]] = fadd fast float [[BIN_RDX13]],
>> [[TMP10]]
>> -; CHECK-NEXT:    [[TMP11:%.*]] = fadd fast float [[BIN_RDX18]], [[TMP1]]
>> +; CHECK-NEXT:    [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
>> +; CHECK-NEXT:    [[RDX_SHUF13:%.*]] = shufflevector <4 x float>
>> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> +; CHECK-NEXT:    [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]],
>> [[RDX_SHUF13]]
>> +; CHECK-NEXT:    [[RDX_SHUF15:%.*]] = shufflevector <4 x float>
>> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef>
>> +; CHECK-NEXT:    [[BIN_RDX16:%.*]] = fadd fast <4 x float>
>> [[BIN_RDX14]], [[RDX_SHUF15]]
>> +; CHECK-NEXT:    [[TMP10:%.*]] = extractelement <4 x float>
>> [[BIN_RDX16]], i32 0
>> +; CHECK-NEXT:    [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
>> +; CHECK-NEXT:    [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]
>>  ; CHECK-NEXT:    [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
>>  ; CHECK-NEXT:    [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
>>  ; CHECK-NEXT:    ret float [[TMP12]]
>> @@ -1234,14 +1240,14 @@ define float @loadadd31(float* nocapture
>>  ; THRESHOLD-NEXT:    [[RDX_SHUF11:%.*]] = shufflevector <8 x float>
>> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>>  ; THRESHOLD-NEXT:    [[BIN_RDX12:%.*]] = fadd fast <8 x float>
>> [[BIN_RDX10]], [[RDX_SHUF11]]
>>  ; THRESHOLD-NEXT:    [[TMP9:%.*]] = extractelement <8 x float>
>> [[BIN_RDX12]], i32 0
>> -; THRESHOLD-NEXT:    [[BIN_RDX13:%.*]] = fadd fast float [[TMP8]],
>> [[TMP9]]
>> -; THRESHOLD-NEXT:    [[RDX_SHUF14:%.*]] = shufflevector <4 x float>
>> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> -; THRESHOLD-NEXT:    [[BIN_RDX15:%.*]] = fadd fast <4 x float> [[TMP3]],
>> [[RDX_SHUF14]]
>> -; THRESHOLD-NEXT:    [[RDX_SHUF16:%.*]] = shufflevector <4 x float>
>> [[BIN_RDX15]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef>
>> -; THRESHOLD-NEXT:    [[BIN_RDX17:%.*]] = fadd fast <4 x float>
>> [[BIN_RDX15]], [[RDX_SHUF16]]
>> -; THRESHOLD-NEXT:    [[TMP10:%.*]] = extractelement <4 x float>
>> [[BIN_RDX17]], i32 0
>> -; THRESHOLD-NEXT:    [
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170915/ed2b3f1a/attachment.html>