[llvm] r312791 - [SLP] Support for horizontal min/max reduction.
Chandler Carruth via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 15 15:25:32 PDT 2017
I think this caused a miscompile: http://llvm.org/PR34635
I've reverted it for now in r313409.
On Fri, Sep 8, 2017 at 3:52 PM Alexey Bataev via llvm-commits <
llvm-commits at lists.llvm.org> wrote:
> Galina, thanks. Will fix it ASAP
>
> Best regards,
> Alexey Bataev
>
> 8 сент. 2017 г., в 17:56, Galina Kistanova <gkistanova at gmail.com>
> написал(а):
>
> Hello Alexey,
>
> It looks like this commit added warnings to one of our builders:
> http://lab.llvm.org:8011/builders/ubuntu-gcc7.1-werror/builds/1263
>
> ...
> FAILED: /usr/local/gcc-7.1/bin/g++-7.1 -DGTEST_HAS_RTTI=0 -D_DEBUG
> -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
> -D__STDC_LIMIT_MACROS -Ilib/Transforms/Vectorize
> -I/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize
> -Iinclude -I/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/include
> -Wno-noexcept-type -fPIC -fvisibility-inlines-hidden -Werror
> -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings
> -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long
> -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
> -ffunction-sections -fdata-sections -O3 -fPIC -UNDEBUG -fno-exceptions
> -fno-rtti -MD -MT
> lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o
> -MF
> lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o.d
> -o
> lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o
> -c
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:
> In member function ‘unsigned int
> {anonymous}::HorizontalReduction::OperationData::getRequiredNumberOfUses()
> const’:
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4733:5:
> error: control reaches end of non-void function [-Werror=return-type]
> }
> ^
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:
> In member function ‘unsigned int
> {anonymous}::HorizontalReduction::OperationData::getNumberOfOperands()
> const’:
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4716:5:
> error: control reaches end of non-void function [-Werror=return-type]
> }
> ^
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:
> In member function ‘int
> {anonymous}::HorizontalReduction::getReductionCost(llvm::TargetTransformInfo*,
> llvm::Value*, unsigned int)’:
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5183:18:
> error: this statement may fall through [-Werror=implicit-fallthrough=]
> IsUnsigned = false;
> ~~~~~~~~~~~^~~~~~~
> /home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5184:5:
> note: here
> case RK_UMin:
> ^~~~
> cc1plus: all warnings being treated as errors
>
>
> Please have a look?
>
> Thanks
>
> Galina
>
> On Fri, Sep 8, 2017 at 6:49 AM, Alexey Bataev via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Author: abataev
>> Date: Fri Sep 8 06:49:36 2017
>> New Revision: 312791
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=312791&view=rev
>> Log:
>> [SLP] Support for horizontal min/max reduction.
>>
>> SLP vectorizer supports horizontal reductions for Add/FAdd binary
>> operations. Patch adds support for horizontal min/max reductions.
>> Function getReductionCost() is split to getArithmeticReductionCost() for
>> binary operation reductions and getMinMaxReductionCost() for min/max
>> reductions.
>> Patch fixes PR26956.
>>
>> Differential revision: https://reviews.llvm.org/D27846
>>
>> Modified:
>> llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h
>> llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
>> llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
>> llvm/trunk/lib/Analysis/CostModel.cpp
>> llvm/trunk/lib/Analysis/TargetTransformInfo.cpp
>> llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
>> llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
>> llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll
>>
>> Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h (original)
>> +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h Fri Sep 8
>> 06:49:36 2017
>> @@ -732,6 +732,8 @@ public:
>> /// ((v0+v2), (v1+v3), undef, undef)
>> int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
>> bool IsPairwiseForm) const;
>> + int getMinMaxReductionCost(Type *Ty, Type *CondTy, bool IsPairwiseForm,
>> + bool IsUnsigned) const;
>>
>> /// \returns The cost of Intrinsic instructions. Analyses the real
>> arguments.
>> /// Three cases are handled: 1. scalar instruction 2. vector
>> instruction
>> @@ -998,6 +1000,8 @@ public:
>> unsigned AddressSpace) = 0;
>> virtual int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
>> bool IsPairwiseForm) = 0;
>> + virtual int getMinMaxReductionCost(Type *Ty, Type *CondTy,
>> + bool IsPairwiseForm, bool
>> IsUnsigned) = 0;
>> virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
>> ArrayRef<Type *> Tys, FastMathFlags FMF,
>> unsigned ScalarizationCostPassed) = 0;
>> @@ -1309,6 +1313,10 @@ public:
>> bool IsPairwiseForm) override {
>> return Impl.getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);
>> }
>> + int getMinMaxReductionCost(Type *Ty, Type *CondTy,
>> + bool IsPairwiseForm, bool IsUnsigned)
>> override {
>> + return Impl.getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm,
>> IsUnsigned);
>> + }
>> int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, ArrayRef<Type
>> *> Tys,
>> FastMathFlags FMF, unsigned ScalarizationCostPassed)
>> override {
>> return Impl.getIntrinsicInstrCost(ID, RetTy, Tys, FMF,
>>
>> Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h (original)
>> +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h Fri Sep 8
>> 06:49:36 2017
>> @@ -451,6 +451,8 @@ public:
>>
>> unsigned getArithmeticReductionCost(unsigned, Type *, bool) { return
>> 1; }
>>
>> + unsigned getMinMaxReductionCost(Type *, Type *, bool, bool) { return
>> 1; }
>> +
>> unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) { return
>> 0; }
>>
>> bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) {
>>
>> Modified: llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h (original)
>> +++ llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h Fri Sep 8 06:49:36
>> 2017
>> @@ -1166,6 +1166,66 @@ public:
>> return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false,
>> true);
>> }
>>
>> + /// Try to calculate op costs for min/max reduction operations.
>> + /// \param CondTy Conditional type for the Select instruction.
>> + unsigned getMinMaxReductionCost(Type *Ty, Type *CondTy, bool
>> IsPairwise,
>> + bool) {
>> + assert(Ty->isVectorTy() && "Expect a vector type");
>> + Type *ScalarTy = Ty->getVectorElementType();
>> + Type *ScalarCondTy = CondTy->getVectorElementType();
>> + unsigned NumVecElts = Ty->getVectorNumElements();
>> + unsigned NumReduxLevels = Log2_32(NumVecElts);
>> + unsigned CmpOpcode;
>> + if (Ty->isFPOrFPVectorTy()) {
>> + CmpOpcode = Instruction::FCmp;
>> + } else {
>> + assert(Ty->isIntOrIntVectorTy() &&
>> + "expecting floating point or integer type for min/max
>> reduction");
>> + CmpOpcode = Instruction::ICmp;
>> + }
>> + unsigned MinMaxCost = 0;
>> + unsigned ShuffleCost = 0;
>> + auto *ConcreteTTI = static_cast<T *>(this);
>> + std::pair<unsigned, MVT> LT =
>> + ConcreteTTI->getTLI()->getTypeLegalizationCost(DL, Ty);
>> + unsigned LongVectorCount = 0;
>> + unsigned MVTLen =
>> + LT.second.isVector() ? LT.second.getVectorNumElements() : 1;
>> + while (NumVecElts > MVTLen) {
>> + NumVecElts /= 2;
>> + // Assume the pairwise shuffles add a cost.
>> + ShuffleCost += (IsPairwise + 1) *
>> +
>> ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,
>> + NumVecElts, Ty);
>> + MinMaxCost +=
>> + ConcreteTTI->getCmpSelInstrCost(CmpOpcode, Ty, CondTy,
>> nullptr) +
>> + ConcreteTTI->getCmpSelInstrCost(Instruction::Select, Ty,
>> CondTy,
>> + nullptr);
>> + Ty = VectorType::get(ScalarTy, NumVecElts);
>> + CondTy = VectorType::get(ScalarCondTy, NumVecElts);
>> + ++LongVectorCount;
>> + }
>> + // The minimal length of the vector is limited by the real length of
>> vector
>> + // operations performed on the current platform. That's why several
>> final
>> + // reduction opertions are perfomed on the vectors with the same
>> + // architecture-dependent length.
>> + ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1)
>> *
>> + ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector,
>> Ty,
>> + NumVecElts, Ty);
>> + MinMaxCost +=
>> + (NumReduxLevels - LongVectorCount) *
>> + (ConcreteTTI->getCmpSelInstrCost(CmpOpcode, Ty, CondTy, nullptr)
>> +
>> + ConcreteTTI->getCmpSelInstrCost(Instruction::Select, Ty, CondTy,
>> + nullptr));
>> + // Need 3 extractelement instructions for scalarization + an
>> additional
>> + // scalar select instruction.
>> + return ShuffleCost + MinMaxCost +
>> + 3 * getScalarizationOverhead(Ty, /*Insert=*/false,
>> + /*Extract=*/true) +
>> + ConcreteTTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
>> + ScalarCondTy, nullptr);
>> + }
>> +
>> unsigned getVectorSplitCost() { return 1; }
>>
>> /// @}
>>
>> Modified: llvm/trunk/lib/Analysis/CostModel.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/CostModel.cpp?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Analysis/CostModel.cpp (original)
>> +++ llvm/trunk/lib/Analysis/CostModel.cpp Fri Sep 8 06:49:36 2017
>> @@ -186,26 +186,56 @@ static bool matchPairwiseShuffleMask(Shu
>> }
>>
>> namespace {
>> +/// Kind of the reduction data.
>> +enum ReductionKind {
>> + RK_None, /// Not a reduction.
>> + RK_Arithmetic, /// Binary reduction data.
>> + RK_MinMax, /// Min/max reduction data.
>> + RK_UnsignedMinMax, /// Unsigned min/max reduction data.
>> +};
>> /// Contains opcode + LHS/RHS parts of the reduction operations.
>> struct ReductionData {
>> - explicit ReductionData() = default;
>> - ReductionData(unsigned Opcode, Value *LHS, Value *RHS)
>> - : Opcode(Opcode), LHS(LHS), RHS(RHS) {}
>> + ReductionData() = delete;
>> + ReductionData(ReductionKind Kind, unsigned Opcode, Value *LHS, Value
>> *RHS)
>> + : Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind) {
>> + assert(Kind != RK_None && "expected binary or min/max reduction
>> only.");
>> + }
>> unsigned Opcode = 0;
>> Value *LHS = nullptr;
>> Value *RHS = nullptr;
>> + ReductionKind Kind = RK_None;
>> + bool hasSameData(ReductionData &RD) const {
>> + return Kind == RD.Kind && Opcode == RD.Opcode;
>> + }
>> };
>> } // namespace
>>
>> static Optional<ReductionData> getReductionData(Instruction *I) {
>> Value *L, *R;
>> if (m_BinOp(m_Value(L), m_Value(R)).match(I))
>> - return ReductionData(I->getOpcode(), L, R);
>> + return ReductionData(RK_Arithmetic, I->getOpcode(), L, R);
>> + if (auto *SI = dyn_cast<SelectInst>(I)) {
>> + if (m_SMin(m_Value(L), m_Value(R)).match(SI) ||
>> + m_SMax(m_Value(L), m_Value(R)).match(SI) ||
>> + m_OrdFMin(m_Value(L), m_Value(R)).match(SI) ||
>> + m_OrdFMax(m_Value(L), m_Value(R)).match(SI) ||
>> + m_UnordFMin(m_Value(L), m_Value(R)).match(SI) ||
>> + m_UnordFMax(m_Value(L), m_Value(R)).match(SI)) {
>> + auto *CI = cast<CmpInst>(SI->getCondition());
>> + return ReductionData(RK_MinMax, CI->getOpcode(), L, R);
>> + }
>> + if (m_UMin(m_Value(L), m_Value(R)).match(SI) ||
>> + m_UMax(m_Value(L), m_Value(R)).match(SI)) {
>> + auto *CI = cast<CmpInst>(SI->getCondition());
>> + return ReductionData(RK_UnsignedMinMax, CI->getOpcode(), L, R);
>> + }
>> + }
>> return llvm::None;
>> }
>>
>> -static bool matchPairwiseReductionAtLevel(Instruction *I, unsigned Level,
>> - unsigned NumLevels) {
>> +static ReductionKind matchPairwiseReductionAtLevel(Instruction *I,
>> + unsigned Level,
>> + unsigned NumLevels) {
>> // Match one level of pairwise operations.
>> // %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
>> // <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
>> @@ -213,24 +243,24 @@ static bool matchPairwiseReductionAtLeve
>> // <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
>> // %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
>> if (!I)
>> - return false;
>> + return RK_None;
>>
>> assert(I->getType()->isVectorTy() && "Expecting a vector type");
>>
>> Optional<ReductionData> RD = getReductionData(I);
>> if (!RD)
>> - return false;
>> + return RK_None;
>>
>> ShuffleVectorInst *LS = dyn_cast<ShuffleVectorInst>(RD->LHS);
>> if (!LS && Level)
>> - return false;
>> + return RK_None;
>> ShuffleVectorInst *RS = dyn_cast<ShuffleVectorInst>(RD->RHS);
>> if (!RS && Level)
>> - return false;
>> + return RK_None;
>>
>> // On level 0 we can omit one shufflevector instruction.
>> if (!Level && !RS && !LS)
>> - return false;
>> + return RK_None;
>>
>> // Shuffle inputs must match.
>> Value *NextLevelOpL = LS ? LS->getOperand(0) : nullptr;
>> @@ -239,7 +269,7 @@ static bool matchPairwiseReductionAtLeve
>> if (NextLevelOpR && NextLevelOpL) {
>> // If we have two shuffles their operands must match.
>> if (NextLevelOpL != NextLevelOpR)
>> - return false;
>> + return RK_None;
>>
>> NextLevelOp = NextLevelOpL;
>> } else if (Level == 0 && (NextLevelOpR || NextLevelOpL)) {
>> @@ -250,45 +280,47 @@ static bool matchPairwiseReductionAtLeve
>> // %NextLevelOpL = shufflevector %R, <1, undef ...>
>> // %BinOp = fadd %NextLevelOpL, %R
>> if (NextLevelOpL && NextLevelOpL != RD->RHS)
>> - return false;
>> + return RK_None;
>> else if (NextLevelOpR && NextLevelOpR != RD->LHS)
>> - return false;
>> + return RK_None;
>>
>> NextLevelOp = NextLevelOpL ? RD->RHS : RD->LHS;
>> - } else
>> - return false;
>> + } else {
>> + return RK_None;
>> + }
>>
>> // Check that the next levels binary operation exists and matches with
>> the
>> // current one.
>> if (Level + 1 != NumLevels) {
>> Optional<ReductionData> NextLevelRD =
>> getReductionData(cast<Instruction>(NextLevelOp));
>> - if (!NextLevelRD || RD->Opcode != NextLevelRD->Opcode)
>> - return false;
>> + if (!NextLevelRD || !RD->hasSameData(*NextLevelRD))
>> + return RK_None;
>> }
>>
>> // Shuffle mask for pairwise operation must match.
>> if (matchPairwiseShuffleMask(LS, /*IsLeft=*/true, Level)) {
>> if (!matchPairwiseShuffleMask(RS, /*IsLeft=*/false, Level))
>> - return false;
>> + return RK_None;
>> } else if (matchPairwiseShuffleMask(RS, /*IsLeft=*/true, Level)) {
>> if (!matchPairwiseShuffleMask(LS, /*IsLeft=*/false, Level))
>> - return false;
>> - } else
>> - return false;
>> + return RK_None;
>> + } else {
>> + return RK_None;
>> + }
>>
>> if (++Level == NumLevels)
>> - return true;
>> + return RD->Kind;
>>
>> // Match next level.
>> return matchPairwiseReductionAtLevel(cast<Instruction>(NextLevelOp),
>> Level,
>> NumLevels);
>> }
>>
>> -static bool matchPairwiseReduction(const ExtractElementInst *ReduxRoot,
>> - unsigned &Opcode, Type *&Ty) {
>> +static ReductionKind matchPairwiseReduction(const ExtractElementInst
>> *ReduxRoot,
>> + unsigned &Opcode, Type *&Ty)
>> {
>> if (!EnableReduxCost)
>> - return false;
>> + return RK_None;
>>
>> // Need to extract the first element.
>> ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
>> @@ -296,19 +328,19 @@ static bool matchPairwiseReduction(const
>> if (CI)
>> Idx = CI->getZExtValue();
>> if (Idx != 0)
>> - return false;
>> + return RK_None;
>>
>> auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
>> if (!RdxStart)
>> - return false;
>> + return RK_None;
>> Optional<ReductionData> RD = getReductionData(RdxStart);
>> if (!RD)
>> - return false;
>> + return RK_None;
>>
>> Type *VecTy = RdxStart->getType();
>> unsigned NumVecElems = VecTy->getVectorNumElements();
>> if (!isPowerOf2_32(NumVecElems))
>> - return false;
>> + return RK_None;
>>
>> // We look for a sequence of shuffle,shuffle,add triples like the
>> following
>> // that builds a pairwise reduction tree.
>> @@ -328,13 +360,14 @@ static bool matchPairwiseReduction(const
>> // <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
>> // %bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
>> // %r = extractelement <4 x float> %bin.rdx8, i32 0
>> - if (!matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)))
>> - return false;
>> + if (matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)) ==
>> + RK_None)
>> + return RK_None;
>>
>> Opcode = RD->Opcode;
>> Ty = VecTy;
>>
>> - return true;
>> + return RD->Kind;
>> }
>>
>> static std::pair<Value *, ShuffleVectorInst *>
>> @@ -348,10 +381,11 @@ getShuffleAndOtherOprd(Value *L, Value *
>> return std::make_pair(L, S);
>> }
>>
>> -static bool matchVectorSplittingReduction(const ExtractElementInst
>> *ReduxRoot,
>> - unsigned &Opcode, Type *&Ty) {
>> +static ReductionKind
>> +matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,
>> + unsigned &Opcode, Type *&Ty) {
>> if (!EnableReduxCost)
>> - return false;
>> + return RK_None;
>>
>> // Need to extract the first element.
>> ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
>> @@ -359,19 +393,19 @@ static bool matchVectorSplittingReductio
>> if (CI)
>> Idx = CI->getZExtValue();
>> if (Idx != 0)
>> - return false;
>> + return RK_None;
>>
>> auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
>> if (!RdxStart)
>> - return false;
>> + return RK_None;
>> Optional<ReductionData> RD = getReductionData(RdxStart);
>> if (!RD)
>> - return false;
>> + return RK_None;
>>
>> Type *VecTy = ReduxRoot->getOperand(0)->getType();
>> unsigned NumVecElems = VecTy->getVectorNumElements();
>> if (!isPowerOf2_32(NumVecElems))
>> - return false;
>> + return RK_None;
>>
>> // We look for a sequence of shuffles and adds like the following
>> matching one
>> // fadd, shuffle vector pair at a time.
>> @@ -391,10 +425,10 @@ static bool matchVectorSplittingReductio
>> while (NumVecElemsRemain - 1) {
>> // Check for the right reduction operation.
>> if (!RdxOp)
>> - return false;
>> + return RK_None;
>> Optional<ReductionData> RDLevel = getReductionData(RdxOp);
>> - if (!RDLevel || RDLevel->Opcode != RD->Opcode)
>> - return false;
>> + if (!RDLevel || !RDLevel->hasSameData(*RD))
>> + return RK_None;
>>
>> Value *NextRdxOp;
>> ShuffleVectorInst *Shuffle;
>> @@ -403,9 +437,9 @@ static bool matchVectorSplittingReductio
>>
>> // Check the current reduction operation and the shuffle use the
>> same value.
>> if (Shuffle == nullptr)
>> - return false;
>> + return RK_None;
>> if (Shuffle->getOperand(0) != NextRdxOp)
>> - return false;
>> + return RK_None;
>>
>> // Check that shuffle masks matches.
>> for (unsigned j = 0; j != MaskStart; ++j)
>> @@ -415,7 +449,7 @@ static bool matchVectorSplittingReductio
>>
>> SmallVector<int, 16> Mask = Shuffle->getShuffleMask();
>> if (ShuffleMask != Mask)
>> - return false;
>> + return RK_None;
>>
>> RdxOp = dyn_cast<Instruction>(NextRdxOp);
>> NumVecElemsRemain /= 2;
>> @@ -424,7 +458,7 @@ static bool matchVectorSplittingReductio
>>
>> Opcode = RD->Opcode;
>> Ty = VecTy;
>> - return true;
>> + return RD->Kind;
>> }
>>
>> unsigned CostModelAnalysis::getInstructionCost(const Instruction *I)
>> const {
>> @@ -519,13 +553,36 @@ unsigned CostModelAnalysis::getInstructi
>> unsigned ReduxOpCode;
>> Type *ReduxType;
>>
>> - if (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {
>> + switch (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {
>> + case RK_Arithmetic:
>> return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,
>> /*IsPairwiseForm=*/false);
>> + case RK_MinMax:
>> + return TTI->getMinMaxReductionCost(
>> + ReduxType, CmpInst::makeCmpResultType(ReduxType),
>> + /*IsPairwiseForm=*/false, /*IsUnsigned=*/false);
>> + case RK_UnsignedMinMax:
>> + return TTI->getMinMaxReductionCost(
>> + ReduxType, CmpInst::makeCmpResultType(ReduxType),
>> + /*IsPairwiseForm=*/false, /*IsUnsigned=*/true);
>> + case RK_None:
>> + break;
>> }
>> - if (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
>> +
>> + switch (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
>> + case RK_Arithmetic:
>> return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,
>> /*IsPairwiseForm=*/true);
>> + case RK_MinMax:
>> + return TTI->getMinMaxReductionCost(
>> + ReduxType, CmpInst::makeCmpResultType(ReduxType),
>> + /*IsPairwiseForm=*/true, /*IsUnsigned=*/false);
>> + case RK_UnsignedMinMax:
>> + return TTI->getMinMaxReductionCost(
>> + ReduxType, CmpInst::makeCmpResultType(ReduxType),
>> + /*IsPairwiseForm=*/true, /*IsUnsigned=*/true);
>> + case RK_None:
>> + break;
>> }
>>
>> return TTI->getVectorInstrCost(I->getOpcode(),
>>
>> Modified: llvm/trunk/lib/Analysis/TargetTransformInfo.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/TargetTransformInfo.cpp?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Analysis/TargetTransformInfo.cpp (original)
>> +++ llvm/trunk/lib/Analysis/TargetTransformInfo.cpp Fri Sep 8 06:49:36
>> 2017
>> @@ -484,6 +484,15 @@ int TargetTransformInfo::getArithmeticRe
>> return Cost;
>> }
>>
>> +int TargetTransformInfo::getMinMaxReductionCost(Type *Ty, Type *CondTy,
>> + bool IsPairwiseForm,
>> + bool IsUnsigned) const {
>> + int Cost =
>> + TTIImpl->getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm,
>> IsUnsigned);
>> + assert(Cost >= 0 && "TTI should not produce negative costs!");
>> + return Cost;
>> +}
>> +
>> unsigned
>> TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys)
>> const {
>> return TTIImpl->getCostOfKeepingLiveOverCall(Tys);
>>
>> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Fri Sep 8
>> 06:49:36 2017
>> @@ -1999,6 +1999,152 @@ int X86TTIImpl::getArithmeticReductionCo
>> return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwise);
>> }
>>
>> +int X86TTIImpl::getMinMaxReductionCost(Type *ValTy, Type *CondTy,
>> + bool IsPairwise, bool IsUnsigned)
>> {
>> + std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
>> +
>> + MVT MTy = LT.second;
>> +
>> + int ISD;
>> + if (ValTy->isIntOrIntVectorTy()) {
>> + ISD = IsUnsigned ? ISD::UMIN : ISD::SMIN;
>> + } else {
>> + assert(ValTy->isFPOrFPVectorTy() &&
>> + "Expected float point or integer vector type.");
>> + ISD = ISD::FMINNUM;
>> + }
>> +
>> + // We use the Intel Architecture Code Analyzer(IACA) to measure the
>> throughput
>> + // and make it as the cost.
>> +
>> + static const CostTblEntry SSE42CostTblPairWise[] = {
>> + {ISD::FMINNUM, MVT::v2f64, 3},
>> + {ISD::FMINNUM, MVT::v4f32, 2},
>> + {ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is
>> "6.8"
>> + {ISD::UMIN, MVT::v2i64, 8}, // The data reported by the IACA is
>> "8.6"
>> + {ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is
>> "1.5"
>> + {ISD::UMIN, MVT::v4i32, 2}, // The data reported by the IACA is
>> "1.8"
>> + {ISD::SMIN, MVT::v8i16, 2},
>> + {ISD::UMIN, MVT::v8i16, 2},
>> + };
>> +
>> + static const CostTblEntry AVX1CostTblPairWise[] = {
>> + {ISD::FMINNUM, MVT::v4f32, 1},
>> + {ISD::FMINNUM, MVT::v4f64, 1},
>> + {ISD::FMINNUM, MVT::v8f32, 2},
>> + {ISD::SMIN, MVT::v2i64, 3},
>> + {ISD::UMIN, MVT::v2i64, 3},
>> + {ISD::SMIN, MVT::v4i32, 1},
>> + {ISD::UMIN, MVT::v4i32, 1},
>> + {ISD::SMIN, MVT::v8i16, 1},
>> + {ISD::UMIN, MVT::v8i16, 1},
>> + {ISD::SMIN, MVT::v8i32, 3},
>> + {ISD::UMIN, MVT::v8i32, 3},
>> + };
>> +
>> + static const CostTblEntry AVX2CostTblPairWise[] = {
>> + {ISD::SMIN, MVT::v4i64, 2},
>> + {ISD::UMIN, MVT::v4i64, 2},
>> + {ISD::SMIN, MVT::v8i32, 1},
>> + {ISD::UMIN, MVT::v8i32, 1},
>> + {ISD::SMIN, MVT::v16i16, 1},
>> + {ISD::UMIN, MVT::v16i16, 1},
>> + {ISD::SMIN, MVT::v32i8, 2},
>> + {ISD::UMIN, MVT::v32i8, 2},
>> + };
>> +
>> + static const CostTblEntry AVX512CostTblPairWise[] = {
>> + {ISD::FMINNUM, MVT::v8f64, 1},
>> + {ISD::FMINNUM, MVT::v16f32, 2},
>> + {ISD::SMIN, MVT::v8i64, 2},
>> + {ISD::UMIN, MVT::v8i64, 2},
>> + {ISD::SMIN, MVT::v16i32, 1},
>> + {ISD::UMIN, MVT::v16i32, 1},
>> + };
>> +
>> + static const CostTblEntry SSE42CostTblNoPairWise[] = {
>> + {ISD::FMINNUM, MVT::v2f64, 3},
>> + {ISD::FMINNUM, MVT::v4f32, 3},
>> + {ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is
>> "6.8"
>> + {ISD::UMIN, MVT::v2i64, 9}, // The data reported by the IACA is
>> "8.6"
>> + {ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is
>> "1.5"
>> + {ISD::UMIN, MVT::v4i32, 2}, // The data reported by the IACA is
>> "1.8"
>> + {ISD::SMIN, MVT::v8i16, 1}, // The data reported by the IACA is
>> "1.5"
>> + {ISD::UMIN, MVT::v8i16, 2}, // The data reported by the IACA is
>> "1.8"
>> + };
>> +
>> + static const CostTblEntry AVX1CostTblNoPairWise[] = {
>> + {ISD::FMINNUM, MVT::v4f32, 1},
>> + {ISD::FMINNUM, MVT::v4f64, 1},
>> + {ISD::FMINNUM, MVT::v8f32, 1},
>> + {ISD::SMIN, MVT::v2i64, 3},
>> + {ISD::UMIN, MVT::v2i64, 3},
>> + {ISD::SMIN, MVT::v4i32, 1},
>> + {ISD::UMIN, MVT::v4i32, 1},
>> + {ISD::SMIN, MVT::v8i16, 1},
>> + {ISD::UMIN, MVT::v8i16, 1},
>> + {ISD::SMIN, MVT::v8i32, 2},
>> + {ISD::UMIN, MVT::v8i32, 2},
>> + };
>> +
>> + static const CostTblEntry AVX2CostTblNoPairWise[] = {
>> + {ISD::SMIN, MVT::v4i64, 1},
>> + {ISD::UMIN, MVT::v4i64, 1},
>> + {ISD::SMIN, MVT::v8i32, 1},
>> + {ISD::UMIN, MVT::v8i32, 1},
>> + {ISD::SMIN, MVT::v16i16, 1},
>> + {ISD::UMIN, MVT::v16i16, 1},
>> + {ISD::SMIN, MVT::v32i8, 1},
>> + {ISD::UMIN, MVT::v32i8, 1},
>> + };
>> +
>> + static const CostTblEntry AVX512CostTblNoPairWise[] = {
>> + {ISD::FMINNUM, MVT::v8f64, 1},
>> + {ISD::FMINNUM, MVT::v16f32, 2},
>> + {ISD::SMIN, MVT::v8i64, 1},
>> + {ISD::UMIN, MVT::v8i64, 1},
>> + {ISD::SMIN, MVT::v16i32, 1},
>> + {ISD::UMIN, MVT::v16i32, 1},
>> + };
>> +
>> + if (IsPairwise) {
>> + if (ST->hasAVX512())
>> + if (const auto *Entry = CostTableLookup(AVX512CostTblPairWise,
>> ISD, MTy))
>> + return LT.first * Entry->Cost;
>> +
>> + if (ST->hasAVX2())
>> + if (const auto *Entry = CostTableLookup(AVX2CostTblPairWise, ISD,
>> MTy))
>> + return LT.first * Entry->Cost;
>> +
>> + if (ST->hasAVX())
>> + if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD,
>> MTy))
>> + return LT.first * Entry->Cost;
>> +
>> + if (ST->hasSSE42())
>> + if (const auto *Entry = CostTableLookup(SSE42CostTblPairWise, ISD,
>> MTy))
>> + return LT.first * Entry->Cost;
>> + } else {
>> + if (ST->hasAVX512())
>> + if (const auto *Entry =
>> + CostTableLookup(AVX512CostTblNoPairWise, ISD, MTy))
>> + return LT.first * Entry->Cost;
>> +
>> + if (ST->hasAVX2())
>> + if (const auto *Entry = CostTableLookup(AVX2CostTblNoPairWise,
>> ISD, MTy))
>> + return LT.first * Entry->Cost;
>> +
>> + if (ST->hasAVX())
>> + if (const auto *Entry = CostTableLookup(AVX1CostTblNoPairWise,
>> ISD, MTy))
>> + return LT.first * Entry->Cost;
>> +
>> + if (ST->hasSSE42())
>> + if (const auto *Entry = CostTableLookup(SSE42CostTblNoPairWise,
>> ISD, MTy))
>> + return LT.first * Entry->Cost;
>> + }
>> +
>> + return BaseT::getMinMaxReductionCost(ValTy, CondTy, IsPairwise,
>> IsUnsigned);
>> +}
>> +
>> /// \brief Calculate the cost of materializing a 64-bit value. This
>> helper
>> /// method might only calculate a fraction of a larger immediate.
>> Therefore it
>> /// is valid to return a cost of ZERO.
>>
>> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h (original)
>> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h Fri Sep 8
>> 06:49:36 2017
>> @@ -96,6 +96,9 @@ public:
>> int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
>> bool IsPairwiseForm);
>>
>> + int getMinMaxReductionCost(Type *Ty, Type *CondTy, bool IsPairwiseForm,
>> + bool IsUnsigned);
>> +
>> int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
>> unsigned Factor, ArrayRef<unsigned>
>> Indices,
>> unsigned Alignment, unsigned
>> AddressSpace);
>>
>> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)
>> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Sep 8
>> 06:49:36 2017
>> @@ -4627,11 +4627,17 @@ class HorizontalReduction {
>> // Use map vector to make stable output.
>> MapVector<Instruction *, Value *> ExtraArgs;
>>
>> + /// Kind of the reduction data.
>> + enum ReductionKind {
>> + RK_None, /// Not a reduction.
>> + RK_Arithmetic, /// Binary reduction data.
>> + RK_Min, /// Minimum reduction data.
>> + RK_UMin, /// Unsigned minimum reduction data.
>> + RK_Max, /// Maximum reduction data.
>> + RK_UMax, /// Unsigned maximum reduction data.
>> + };
>> /// Contains info about operation, like its opcode, left and right
>> operands.
>> - struct OperationData {
>> - /// true if the operation is a reduced value, false if reduction
>> operation.
>> - bool IsReducedValue = false;
>> -
>> + class OperationData {
>> /// Opcode of the instruction.
>> unsigned Opcode = 0;
>>
>> @@ -4640,12 +4646,21 @@ class HorizontalReduction {
>>
>> /// Right operand of the reduction operation.
>> Value *RHS = nullptr;
>> + /// Kind of the reduction operation.
>> + ReductionKind Kind = RK_None;
>> + /// True if float point min/max reduction has no NaNs.
>> + bool NoNaN = false;
>>
>> /// Checks if the reduction operation can be vectorized.
>> bool isVectorizable() const {
>> return LHS && RHS &&
>> - // We currently only support adds.
>> - (Opcode == Instruction::Add || Opcode == Instruction::FAdd);
>> + // We currently only support adds && min/max reductions.
>> + ((Kind == RK_Arithmetic &&
>> + (Opcode == Instruction::Add || Opcode ==
>> Instruction::FAdd)) ||
>> + ((Opcode == Instruction::ICmp || Opcode ==
>> Instruction::FCmp) &&
>> + (Kind == RK_Min || Kind == RK_Max)) ||
>> + (Opcode == Instruction::ICmp &&
>> + (Kind == RK_UMin || Kind == RK_UMax)));
>> }
>>
>> public:
>> @@ -4653,43 +4668,90 @@ class HorizontalReduction {
>>
>> /// Construction for reduced values. They are identified by opcode
>> only and
>> /// don't have associated LHS/RHS values.
>> - explicit OperationData(Value *V) : IsReducedValue(true) {
>> + explicit OperationData(Value *V) : Kind(RK_None) {
>> if (auto *I = dyn_cast<Instruction>(V))
>> Opcode = I->getOpcode();
>> }
>>
>> - /// Constructor for binary reduction operations with opcode and its
>> left and
>> + /// Constructor for reduction operations with opcode and its left and
>> /// right operands.
>> - OperationData(unsigned Opcode, Value *LHS, Value *RHS)
>> - : Opcode(Opcode), LHS(LHS), RHS(RHS) {}
>> -
>> + OperationData(unsigned Opcode, Value *LHS, Value *RHS, ReductionKind
>> Kind,
>> + bool NoNaN = false)
>> + : Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind), NoNaN(NoNaN) {
>> + assert(Kind != RK_None && "One of the reduction operations is
>> expected.");
>> + }
>> explicit operator bool() const { return Opcode; }
>>
>> /// Get the index of the first operand.
>> unsigned getFirstOperandIndex() const {
>> assert(!!*this && "The opcode is not set.");
>> + switch (Kind) {
>> + case RK_Min:
>> + case RK_UMin:
>> + case RK_Max:
>> + case RK_UMax:
>> + return 1;
>> + case RK_Arithmetic:
>> + case RK_None:
>> + break;
>> + }
>> return 0;
>> }
>>
>> /// Total number of operands in the reduction operation.
>> unsigned getNumberOfOperands() const {
>> - assert(!IsReducedValue && !!*this && LHS && RHS &&
>> + assert(Kind != RK_None && !!*this && LHS && RHS &&
>> "Expected reduction operation.");
>> - return 2;
>> + switch (Kind) {
>> + case RK_Arithmetic:
>> + return 2;
>> + case RK_Min:
>> + case RK_UMin:
>> + case RK_Max:
>> + case RK_UMax:
>> + return 3;
>> + case RK_None:
>> + llvm_unreachable("Reduction kind is not set");
>> + }
>> }
>>
>> /// Expected number of uses for reduction operations/reduced values.
>> unsigned getRequiredNumberOfUses() const {
>> - assert(!IsReducedValue && !!*this && LHS && RHS &&
>> + assert(Kind != RK_None && !!*this && LHS && RHS &&
>> "Expected reduction operation.");
>> - return 1;
>> + switch (Kind) {
>> + case RK_Arithmetic:
>> + return 1;
>> + case RK_Min:
>> + case RK_UMin:
>> + case RK_Max:
>> + case RK_UMax:
>> + return 2;
>> + case RK_None:
>> + llvm_unreachable("Reduction kind is not set");
>> + }
>> }
>>
>> /// Checks if instruction is associative and can be vectorized.
>> bool isAssociative(Instruction *I) const {
>> - assert(!IsReducedValue && *this && LHS && RHS &&
>> + assert(Kind != RK_None && *this && LHS && RHS &&
>> "Expected reduction operation.");
>> - return I->isAssociative();
>> + switch (Kind) {
>> + case RK_Arithmetic:
>> + return I->isAssociative();
>> + case RK_Min:
>> + case RK_Max:
>> + return Opcode == Instruction::ICmp ||
>> + cast<Instruction>(I->getOperand(0))->hasUnsafeAlgebra();
>> + case RK_UMin:
>> + case RK_UMax:
>> + assert(Opcode == Instruction::ICmp &&
>> + "Only integer compare operation is expected.");
>> + return true;
>> + case RK_None:
>> + break;
>> + }
>> + llvm_unreachable("Reduction kind is not set");
>> }
>>
>> /// Checks if the reduction operation can be vectorized.
>> @@ -4700,18 +4762,17 @@ class HorizontalReduction {
>> /// Checks if two operation data are both a reduction op or both a
>> reduced
>> /// value.
>> bool operator==(const OperationData &OD) {
>> - assert(((IsReducedValue != OD.IsReducedValue) ||
>> - ((!LHS == !OD.LHS) && (!RHS == !OD.RHS))) &&
>> + assert(((Kind != OD.Kind) || ((!LHS == !OD.LHS) && (!RHS ==
>> !OD.RHS))) &&
>> "One of the comparing operations is incorrect.");
>> - return this == &OD ||
>> - (IsReducedValue == OD.IsReducedValue && Opcode ==
>> OD.Opcode);
>> + return this == &OD || (Kind == OD.Kind && Opcode == OD.Opcode);
>> }
>> bool operator!=(const OperationData &OD) { return !(*this == OD); }
>> void clear() {
>> - IsReducedValue = false;
>> Opcode = 0;
>> LHS = nullptr;
>> RHS = nullptr;
>> + Kind = RK_None;
>> + NoNaN = false;
>> }
>>
>> /// Get the opcode of the reduction operation.
>> @@ -4720,16 +4781,81 @@ class HorizontalReduction {
>> return Opcode;
>> }
>>
>> + /// Get kind of reduction data.
>> + ReductionKind getKind() const { return Kind; }
>> Value *getLHS() const { return LHS; }
>> Value *getRHS() const { return RHS; }
>> + Type *getConditionType() const {
>> + switch (Kind) {
>> + case RK_Arithmetic:
>> + return nullptr;
>> + case RK_Min:
>> + case RK_Max:
>> + case RK_UMin:
>> + case RK_UMax:
>> + return CmpInst::makeCmpResultType(LHS->getType());
>> + case RK_None:
>> + break;
>> + }
>> + llvm_unreachable("Reduction kind is not set");
>> + }
>>
>> /// Creates reduction operation with the current opcode.
>> Value *createOp(IRBuilder<> &Builder, const Twine &Name = "") const {
>> - assert(!IsReducedValue &&
>> - (Opcode == Instruction::FAdd || Opcode == Instruction::Add)
>> &&
>> - "Expected add|fadd reduction operation.");
>> - return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS,
>> RHS,
>> - Name);
>> + assert(isVectorizable() &&
>> + "Expected add|fadd or min/max reduction operation.");
>> + Value *Cmp;
>> + switch (Kind) {
>> + case RK_Arithmetic:
>> + return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS,
>> RHS,
>> + Name);
>> + case RK_Min:
>> + Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSLT(LHS,
>> RHS)
>> + : Builder.CreateFCmpOLT(LHS,
>> RHS);
>> + break;
>> + case RK_Max:
>> + Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSGT(LHS,
>> RHS)
>> + : Builder.CreateFCmpOGT(LHS,
>> RHS);
>> + break;
>> + case RK_UMin:
>> + assert(Opcode == Instruction::ICmp && "Expected integer types.");
>> + Cmp = Builder.CreateICmpULT(LHS, RHS);
>> + break;
>> + case RK_UMax:
>> + assert(Opcode == Instruction::ICmp && "Expected integer types.");
>> + Cmp = Builder.CreateICmpUGT(LHS, RHS);
>> + break;
>> + case RK_None:
>> + llvm_unreachable("Unknown reduction operation.");
>> + }
>> + return Builder.CreateSelect(Cmp, LHS, RHS, Name);
>> + }
>> + TargetTransformInfo::ReductionFlags getFlags() const {
>> + TargetTransformInfo::ReductionFlags Flags;
>> + Flags.NoNaN = NoNaN;
>> + switch (Kind) {
>> + case RK_Arithmetic:
>> + break;
>> + case RK_Min:
>> + Flags.IsSigned = Opcode == Instruction::ICmp;
>> + Flags.IsMaxOp = false;
>> + break;
>> + case RK_Max:
>> + Flags.IsSigned = Opcode == Instruction::ICmp;
>> + Flags.IsMaxOp = true;
>> + break;
>> + case RK_UMin:
>> + Flags.IsSigned = false;
>> + Flags.IsMaxOp = false;
>> + break;
>> + case RK_UMax:
>> + Flags.IsSigned = false;
>> + Flags.IsMaxOp = true;
>> + break;
>> + case RK_None:
>> + llvm_unreachable("Reduction kind is not set");
>> + }
>> + return Flags;
>> }
>> };
>>
>> @@ -4771,8 +4897,32 @@ class HorizontalReduction {
>>
>> Value *LHS;
>> Value *RHS;
>> - if (m_BinOp(m_Value(LHS), m_Value(RHS)).match(V))
>> - return OperationData(cast<BinaryOperator>(V)->getOpcode(), LHS,
>> RHS);
>> + if (m_BinOp(m_Value(LHS), m_Value(RHS)).match(V)) {
>> + return OperationData(cast<BinaryOperator>(V)->getOpcode(), LHS,
>> RHS,
>> + RK_Arithmetic);
>> + }
>> + if (auto *Select = dyn_cast<SelectInst>(V)) {
>> + // Look for a min/max pattern.
>> + if (m_UMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> + return OperationData(Instruction::ICmp, LHS, RHS, RK_UMin);
>> + } else if (m_SMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> + return OperationData(Instruction::ICmp, LHS, RHS, RK_Min);
>> + } else if (m_OrdFMin(m_Value(LHS), m_Value(RHS)).match(Select) ||
>> + m_UnordFMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> + return OperationData(
>> + Instruction::FCmp, LHS, RHS, RK_Min,
>> + cast<Instruction>(Select->getCondition())->hasNoNaNs());
>> + } else if (m_UMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> + return OperationData(Instruction::ICmp, LHS, RHS, RK_UMax);
>> + } else if (m_SMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> + return OperationData(Instruction::ICmp, LHS, RHS, RK_Max);
>> + } else if (m_OrdFMax(m_Value(LHS), m_Value(RHS)).match(Select) ||
>> + m_UnordFMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
>> + return OperationData(
>> + Instruction::FCmp, LHS, RHS, RK_Max,
>> + cast<Instruction>(Select->getCondition())->hasNoNaNs());
>> + }
>> + }
>> return OperationData(V);
>> }
>>
>> @@ -4965,8 +5115,9 @@ public:
>> if (VectorizedTree) {
>> Builder.SetCurrentDebugLocation(Loc);
>> OperationData VectReductionData(ReductionData.getOpcode(),
>> - VectorizedTree, ReducedSubTree);
>> - VectorizedTree = VectReductionData.createOp(Builder, "bin.rdx");
>> + VectorizedTree, ReducedSubTree,
>> + ReductionData.getKind());
>> + VectorizedTree = VectReductionData.createOp(Builder, "op.rdx");
>> propagateIRFlags(VectorizedTree, ReductionOps);
>> } else
>> VectorizedTree = ReducedSubTree;
>> @@ -4980,7 +5131,8 @@ public:
>> auto *I = cast<Instruction>(ReducedVals[i]);
>> Builder.SetCurrentDebugLocation(I->getDebugLoc());
>> OperationData VectReductionData(ReductionData.getOpcode(),
>> - VectorizedTree, I);
>> + VectorizedTree, I,
>> + ReductionData.getKind());
>> VectorizedTree = VectReductionData.createOp(Builder);
>> propagateIRFlags(VectorizedTree, ReductionOps);
>> }
>> @@ -4991,8 +5143,9 @@ public:
>> for (auto *I : Pair.second) {
>> Builder.SetCurrentDebugLocation(I->getDebugLoc());
>> OperationData VectReductionData(ReductionData.getOpcode(),
>> - VectorizedTree, Pair.first);
>> - VectorizedTree = VectReductionData.createOp(Builder,
>> "bin.extra");
>> + VectorizedTree, Pair.first,
>> + ReductionData.getKind());
>> + VectorizedTree = VectReductionData.createOp(Builder,
>> "op.extra");
>> propagateIRFlags(VectorizedTree, I);
>> }
>> }
>> @@ -5013,19 +5166,58 @@ private:
>> Type *ScalarTy = FirstReducedVal->getType();
>> Type *VecTy = VectorType::get(ScalarTy, ReduxWidth);
>>
>> - int PairwiseRdxCost =
>> - TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,
>> - /*IsPairwiseForm=*/true);
>> - int SplittingRdxCost =
>> - TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,
>> - /*IsPairwiseForm=*/false);
>> + int PairwiseRdxCost;
>> + int SplittingRdxCost;
>> + bool IsUnsigned = true;
>> + switch (ReductionData.getKind()) {
>> + case RK_Arithmetic:
>> + PairwiseRdxCost =
>> + TTI->getArithmeticReductionCost(ReductionData.getOpcode(),
>> VecTy,
>> + /*IsPairwiseForm=*/true);
>> + SplittingRdxCost =
>> + TTI->getArithmeticReductionCost(ReductionData.getOpcode(),
>> VecTy,
>> + /*IsPairwiseForm=*/false);
>> + break;
>> + case RK_Min:
>> + case RK_Max:
>> + IsUnsigned = false;
>> + case RK_UMin:
>> + case RK_UMax: {
>> + Type *VecCondTy = CmpInst::makeCmpResultType(VecTy);
>> + PairwiseRdxCost =
>> + TTI->getMinMaxReductionCost(VecTy, VecCondTy,
>> + /*IsPairwiseForm=*/true,
>> IsUnsigned);
>> + SplittingRdxCost =
>> + TTI->getMinMaxReductionCost(VecTy, VecCondTy,
>> + /*IsPairwiseForm=*/false,
>> IsUnsigned);
>> + break;
>> + }
>> + case RK_None:
>> + llvm_unreachable("Expected arithmetic or min/max reduction
>> operation");
>> + }
>>
>> IsPairwiseReduction = PairwiseRdxCost < SplittingRdxCost;
>> int VecReduxCost = IsPairwiseReduction ? PairwiseRdxCost :
>> SplittingRdxCost;
>>
>> - int ScalarReduxCost =
>> - (ReduxWidth - 1) *
>> - TTI->getArithmeticInstrCost(ReductionData.getOpcode(), ScalarTy);
>> + int ScalarReduxCost;
>> + switch (ReductionData.getKind()) {
>> + case RK_Arithmetic:
>> + ScalarReduxCost =
>> + TTI->getArithmeticInstrCost(ReductionData.getOpcode(),
>> ScalarTy);
>> + break;
>> + case RK_Min:
>> + case RK_Max:
>> + case RK_UMin:
>> + case RK_UMax:
>> + ScalarReduxCost =
>> + TTI->getCmpSelInstrCost(ReductionData.getOpcode(), ScalarTy) +
>> + TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
>> + CmpInst::makeCmpResultType(ScalarTy));
>> + break;
>> + case RK_None:
>> + llvm_unreachable("Expected arithmetic or min/max reduction
>> operation");
>> + }
>> + ScalarReduxCost *= (ReduxWidth - 1);
>>
>> DEBUG(dbgs() << "SLP: Adding cost " << VecReduxCost - ScalarReduxCost
>> << " for reduction that starts with " <<
>> *FirstReducedVal
>> @@ -5047,7 +5239,7 @@ private:
>> if (!IsPairwiseReduction)
>> return createSimpleTargetReduction(
>> Builder, TTI, ReductionData.getOpcode(), VectorizedValue,
>> - TargetTransformInfo::ReductionFlags(), RedOps);
>> + ReductionData.getFlags(), RedOps);
>>
>> Value *TmpVec = VectorizedValue;
>> for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
>> @@ -5062,8 +5254,8 @@ private:
>> TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
>> "rdx.shuf.r");
>> OperationData VectReductionData(ReductionData.getOpcode(),
>> LeftShuf,
>> - RightShuf);
>> - TmpVec = VectReductionData.createOp(Builder, "bin.rdx");
>> + RightShuf,
>> ReductionData.getKind());
>> + TmpVec = VectReductionData.createOp(Builder, "op.rdx");
>> propagateIRFlags(TmpVec, RedOps);
>> }
>>
>> @@ -5224,9 +5416,11 @@ static bool tryToVectorizeHorReductionOr
>> auto *Inst = dyn_cast<Instruction>(V);
>> if (!Inst)
>> continue;
>> - if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {
>> + auto *BI = dyn_cast<BinaryOperator>(Inst);
>> + auto *SI = dyn_cast<SelectInst>(Inst);
>> + if (BI || SI) {
>> HorizontalReduction HorRdx;
>> - if (HorRdx.matchAssociativeReduction(P, BI)) {
>> + if (HorRdx.matchAssociativeReduction(P, Inst)) {
>> if (HorRdx.tryToReduce(R, TTI)) {
>> Res = true;
>> // Set P to nullptr to avoid re-analysis of phi node in
>> @@ -5235,7 +5429,7 @@ static bool tryToVectorizeHorReductionOr
>> continue;
>> }
>> }
>> - if (P) {
>> + if (P && BI) {
>> Inst = dyn_cast<Instruction>(BI->getOperand(0));
>> if (Inst == P)
>> Inst = dyn_cast<Instruction>(BI->getOperand(1));
>>
>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll?rev=312791&r1=312790&r2=312791&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
>> (original)
>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll Fri
>> Sep 8 06:49:36 2017
>> @@ -117,11 +117,11 @@ define float @bazz() {
>> ; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
>> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>> ; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]],
>> [[RDX_SHUF3]]
>> ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]],
>> i32 0
>> -; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
>> -; CHECK-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
>> [[CONV6]]
>> +; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
>> +; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
>> [[CONV6]]
>> ; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
>> -; CHECK-NEXT: store float [[BIN_EXTRA5]], float* @res, align 4
>> -; CHECK-NEXT: ret float [[BIN_EXTRA5]]
>> +; CHECK-NEXT: store float [[OP_EXTRA5]], float* @res, align 4
>> +; CHECK-NEXT: ret float [[OP_EXTRA5]]
>> ;
>> ; THRESHOLD-LABEL: @bazz(
>> ; THRESHOLD-NEXT: entry:
>> @@ -148,11 +148,11 @@ define float @bazz() {
>> ; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
>> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>> ; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float>
>> [[BIN_RDX2]], [[RDX_SHUF3]]
>> ; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <8 x float>
>> [[BIN_RDX4]], i32 0
>> -; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP4]],
>> [[CONV]]
>> -; THRESHOLD-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
>> [[CONV6]]
>> +; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]],
>> [[CONV]]
>> +; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
>> [[CONV6]]
>> ; THRESHOLD-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
>> -; THRESHOLD-NEXT: store float [[BIN_EXTRA5]], float* @res, align 4
>> -; THRESHOLD-NEXT: ret float [[BIN_EXTRA5]]
>> +; THRESHOLD-NEXT: store float [[OP_EXTRA5]], float* @res, align 4
>> +; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
>> ;
>> entry:
>> %0 = load i32, i32* @n, align 4
>> @@ -327,47 +327,53 @@ entry:
>> define float @bar() {
>> ; CHECK-LABEL: @bar(
>> ; CHECK-NEXT: entry:
>> -; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, <2 x float>* bitcast
>> ([20 x float]* @arr to <2 x float>*), align 16
>> -; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast
>> ([20 x float]* @arr1 to <2 x float>*), align 16
>> -; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP0]]
>> -; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32
>> 0
>> -; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32
>> 1
>> +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, <4 x float>* bitcast
>> ([20 x float]* @arr to <4 x float>*), align 16
>> +; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast
>> ([20 x float]* @arr1 to <4 x float>*), align 16
>> +; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
>> +; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32
>> 0
>> +; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32
>> 1
>> ; CHECK-NEXT: [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]], [[TMP4]]
>> -; CHECK-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float
>> [[TMP3]], float [[TMP4]]
>> -; CHECK-NEXT: [[TMP5:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
>> -; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
>> -; CHECK-NEXT: [[MUL3_1:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
>> -; CHECK-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]],
>> [[MUL3_1]]
>> -; CHECK-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
>> [[MAX_0_MUL3]], float [[MUL3_1]]
>> -; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
>> -; CHECK-NEXT: [[TMP8:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
>> -; CHECK-NEXT: [[MUL3_2:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
>> -; CHECK-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]],
>> [[MUL3_2]]
>> -; CHECK-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
>> [[MAX_0_MUL3_1]], float [[MUL3_2]]
>> -; CHECK-NEXT: store float [[MAX_0_MUL3_2]], float* @res, align 4
>> -; CHECK-NEXT: ret float [[MAX_0_MUL3_2]]
>> +; CHECK-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float undef,
>> float undef
>> +; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32
>> 2
>> +; CHECK-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]],
>> [[TMP5]]
>> +; CHECK-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
>> [[MAX_0_MUL3]], float undef
>> +; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32
>> 3
>> +; CHECK-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]],
>> [[TMP6]]
>> +; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]],
>> <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> +; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float>
>> [[TMP2]], [[RDX_SHUF]]
>> +; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1>
>> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
>> +; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float>
>> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32
>> undef, i32 undef>
>> +; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float>
>> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
>> +; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1>
>> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float>
>> [[RDX_SHUF1]]
>> +; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float>
>> [[RDX_MINMAX_SELECT3]], i32 0
>> +; CHECK-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
>> [[MAX_0_MUL3_1]], float undef
>> +; CHECK-NEXT: store float [[TMP7]], float* @res, align 4
>> +; CHECK-NEXT: ret float [[TMP7]]
>> ;
>> ; THRESHOLD-LABEL: @bar(
>> ; THRESHOLD-NEXT: entry:
>> -; THRESHOLD-NEXT: [[TMP0:%.*]] = load <2 x float>, <2 x float>*
>> bitcast ([20 x float]* @arr to <2 x float>*), align 16
>> -; THRESHOLD-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>*
>> bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
>> -; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]],
>> [[TMP0]]
>> -; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]],
>> i32 0
>> -; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]],
>> i32 1
>> +; THRESHOLD-NEXT: [[TMP0:%.*]] = load <4 x float>, <4 x float>*
>> bitcast ([20 x float]* @arr to <4 x float>*), align 16
>> +; THRESHOLD-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>*
>> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
>> +; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]],
>> [[TMP0]]
>> +; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]],
>> i32 0
>> +; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]],
>> i32 1
>> ; THRESHOLD-NEXT: [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]],
>> [[TMP4]]
>> -; THRESHOLD-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float
>> [[TMP3]], float [[TMP4]]
>> -; THRESHOLD-NEXT: [[TMP5:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
>> -; THRESHOLD-NEXT: [[TMP6:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
>> -; THRESHOLD-NEXT: [[MUL3_1:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
>> -; THRESHOLD-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float
>> [[MAX_0_MUL3]], [[MUL3_1]]
>> -; THRESHOLD-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
>> [[MAX_0_MUL3]], float [[MUL3_1]]
>> -; THRESHOLD-NEXT: [[TMP7:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
>> -; THRESHOLD-NEXT: [[TMP8:%.*]] = load float, float* getelementptr
>> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
>> -; THRESHOLD-NEXT: [[MUL3_2:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
>> -; THRESHOLD-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float
>> [[MAX_0_MUL3_1]], [[MUL3_2]]
>> -; THRESHOLD-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
>> [[MAX_0_MUL3_1]], float [[MUL3_2]]
>> -; THRESHOLD-NEXT: store float [[MAX_0_MUL3_2]], float* @res, align 4
>> -; THRESHOLD-NEXT: ret float [[MAX_0_MUL3_2]]
>> +; THRESHOLD-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float
>> undef, float undef
>> +; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]],
>> i32 2
>> +; THRESHOLD-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float
>> [[MAX_0_MUL3]], [[TMP5]]
>> +; THRESHOLD-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
>> [[MAX_0_MUL3]], float undef
>> +; THRESHOLD-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]],
>> i32 3
>> +; THRESHOLD-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float
>> [[MAX_0_MUL3_1]], [[TMP6]]
>> +; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float>
>> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> +; THRESHOLD-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float>
>> [[TMP2]], [[RDX_SHUF]]
>> +; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1>
>> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
>> +; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float>
>> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32
>> undef, i32 undef>
>> +; THRESHOLD-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float>
>> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
>> +; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1>
>> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float>
>> [[RDX_SHUF1]]
>> +; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <4 x float>
>> [[RDX_MINMAX_SELECT3]], i32 0
>> +; THRESHOLD-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
>> [[MAX_0_MUL3_1]], float undef
>> +; THRESHOLD-NEXT: store float [[TMP7]], float* @res, align 4
>> +; THRESHOLD-NEXT: ret float [[TMP7]]
>> ;
>> entry:
>> %0 = load float, float* getelementptr inbounds ([20 x float], [20 x
>> float]* @arr, i64 0, i64 0), align 16
>> @@ -512,9 +518,9 @@ define float @f(float* nocapture readonl
>> ; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float>
>> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>> ; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float>
>> [[BIN_RDX14]], [[RDX_SHUF15]]
>> ; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float>
>> [[BIN_RDX16]], i32 0
>> -; CHECK-NEXT: [[BIN_RDX17:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
>> +; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
>> ; CHECK-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
>> -; CHECK-NEXT: ret float [[BIN_RDX17]]
>> +; CHECK-NEXT: ret float [[OP_RDX]]
>> ;
>> ; THRESHOLD-LABEL: @f(
>> ; THRESHOLD-NEXT: entry:
>> @@ -635,9 +641,9 @@ define float @f(float* nocapture readonl
>> ; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float>
>> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>> ; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float>
>> [[BIN_RDX14]], [[RDX_SHUF15]]
>> ; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <16 x float>
>> [[BIN_RDX16]], i32 0
>> -; THRESHOLD-NEXT: [[BIN_RDX17:%.*]] = fadd fast float [[TMP4]],
>> [[TMP5]]
>> +; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
>> ; THRESHOLD-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
>> -; THRESHOLD-NEXT: ret float [[BIN_RDX17]]
>> +; THRESHOLD-NEXT: ret float [[OP_RDX]]
>> ;
>> entry:
>> %0 = load float, float* %x, align 4
>> @@ -865,9 +871,9 @@ define float @f1(float* nocapture readon
>> ; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float>
>> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef>
>> ; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]],
>> [[RDX_SHUF7]]
>> ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <32 x float>
>> [[BIN_RDX8]], i32 0
>> -; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
>> +; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
>> ; CHECK-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
>> -; CHECK-NEXT: ret float [[BIN_EXTRA]]
>> +; CHECK-NEXT: ret float [[OP_EXTRA]]
>> ;
>> ; THRESHOLD-LABEL: @f1(
>> ; THRESHOLD-NEXT: entry:
>> @@ -948,9 +954,9 @@ define float @f1(float* nocapture readon
>> ; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float>
>> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
>> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef>
>> ; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float>
>> [[BIN_RDX6]], [[RDX_SHUF7]]
>> ; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <32 x float>
>> [[BIN_RDX8]], i32 0
>> -; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]],
>> [[CONV]]
>> +; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]],
>> [[CONV]]
>> ; THRESHOLD-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
>> -; THRESHOLD-NEXT: ret float [[BIN_EXTRA]]
>> +; THRESHOLD-NEXT: ret float [[OP_EXTRA]]
>> ;
>> entry:
>> %rem = srem i32 %a, %b
>> @@ -1138,14 +1144,14 @@ define float @loadadd31(float* nocapture
>> ; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float>
>> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>> ; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float>
>> [[BIN_RDX10]], [[RDX_SHUF11]]
>> ; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float>
>> [[BIN_RDX12]], i32 0
>> -; CHECK-NEXT: [[BIN_RDX13:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
>> -; CHECK-NEXT: [[RDX_SHUF14:%.*]] = shufflevector <4 x float>
>> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> -; CHECK-NEXT: [[BIN_RDX15:%.*]] = fadd fast <4 x float> [[TMP3]],
>> [[RDX_SHUF14]]
>> -; CHECK-NEXT: [[RDX_SHUF16:%.*]] = shufflevector <4 x float>
>> [[BIN_RDX15]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef>
>> -; CHECK-NEXT: [[BIN_RDX17:%.*]] = fadd fast <4 x float>
>> [[BIN_RDX15]], [[RDX_SHUF16]]
>> -; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float>
>> [[BIN_RDX17]], i32 0
>> -; CHECK-NEXT: [[BIN_RDX18:%.*]] = fadd fast float [[BIN_RDX13]],
>> [[TMP10]]
>> -; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[BIN_RDX18]], [[TMP1]]
>> +; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
>> +; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float>
>> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> +; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]],
>> [[RDX_SHUF13]]
>> +; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float>
>> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef>
>> +; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float>
>> [[BIN_RDX14]], [[RDX_SHUF15]]
>> +; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float>
>> [[BIN_RDX16]], i32 0
>> +; CHECK-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
>> +; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]
>> ; CHECK-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
>> ; CHECK-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
>> ; CHECK-NEXT: ret float [[TMP12]]
>> @@ -1234,14 +1240,14 @@ define float @loadadd31(float* nocapture
>> ; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float>
>> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
>> ; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float>
>> [[BIN_RDX10]], [[RDX_SHUF11]]
>> ; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <8 x float>
>> [[BIN_RDX12]], i32 0
>> -; THRESHOLD-NEXT: [[BIN_RDX13:%.*]] = fadd fast float [[TMP8]],
>> [[TMP9]]
>> -; THRESHOLD-NEXT: [[RDX_SHUF14:%.*]] = shufflevector <4 x float>
>> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
>> -; THRESHOLD-NEXT: [[BIN_RDX15:%.*]] = fadd fast <4 x float> [[TMP3]],
>> [[RDX_SHUF14]]
>> -; THRESHOLD-NEXT: [[RDX_SHUF16:%.*]] = shufflevector <4 x float>
>> [[BIN_RDX15]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
>> i32 undef>
>> -; THRESHOLD-NEXT: [[BIN_RDX17:%.*]] = fadd fast <4 x float>
>> [[BIN_RDX15]], [[RDX_SHUF16]]
>> -; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <4 x float>
>> [[BIN_RDX17]], i32 0
>> -; THRESHOLD-NEXT: [
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170915/ed2b3f1a/attachment.html>
More information about the llvm-commits
mailing list