[llvm] r312791 - [SLP] Support for horizontal min/max reduction.
Galina Kistanova via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 8 14:56:45 PDT 2017
Hello Alexey,
It looks like this commit added warnings to one of our builders:
http://lab.llvm.org:8011/builders/ubuntu-gcc7.1-werror/builds/1263
...
FAILED: /usr/local/gcc-7.1/bin/g++-7.1 -DGTEST_HAS_RTTI=0 -D_DEBUG
-D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
-D__STDC_LIMIT_MACROS -Ilib/Transforms/Vectorize
-I/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize
-Iinclude -I/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/include
-Wno-noexcept-type -fPIC -fvisibility-inlines-hidden -Werror
-Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings
-Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long
-Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -O3 -fPIC -UNDEBUG -fno-exceptions
-fno-rtti -MD -MT
lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o
-MF
lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o.d
-o
lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o
-c
/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:
In member function ‘unsigned int
{anonymous}::HorizontalReduction::OperationData::getRequiredNumberOfUses()
const’:
/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4733:5:
error: control reaches end of non-void function [-Werror=return-type]
}
^
/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:
In member function ‘unsigned int
{anonymous}::HorizontalReduction::OperationData::getNumberOfOperands()
const’:
/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4716:5:
error: control reaches end of non-void function [-Werror=return-type]
}
^
/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:
In member function ‘int
{anonymous}::HorizontalReduction::getReductionCost(llvm::TargetTransformInfo*,
llvm::Value*, unsigned int)’:
/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5183:18:
error: this statement may fall through [-Werror=implicit-fallthrough=]
IsUnsigned = false;
~~~~~~~~~~~^~~~~~~
/home/buildslave/am1i-slv2/ubuntu-gcc7.1-werror/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5184:5:
note: here
case RK_UMin:
^~~~
cc1plus: all warnings being treated as errors
Please have a look?
Thanks
Galina
On Fri, Sep 8, 2017 at 6:49 AM, Alexey Bataev via llvm-commits <
llvm-commits at lists.llvm.org> wrote:
> Author: abataev
> Date: Fri Sep 8 06:49:36 2017
> New Revision: 312791
>
> URL: http://llvm.org/viewvc/llvm-project?rev=312791&view=rev
> Log:
> [SLP] Support for horizontal min/max reduction.
>
> SLP vectorizer supports horizontal reductions for Add/FAdd binary
> operations. Patch adds support for horizontal min/max reductions.
> Function getReductionCost() is split to getArithmeticReductionCost() for
> binary operation reductions and getMinMaxReductionCost() for min/max
> reductions.
> Patch fixes PR26956.
>
> Differential revision: https://reviews.llvm.org/D27846
>
> Modified:
> llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h
> llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
> llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
> llvm/trunk/lib/Analysis/CostModel.cpp
> llvm/trunk/lib/Analysis/TargetTransformInfo.cpp
> llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
> llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
> llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
> llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
> llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll
>
> Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/
> TargetTransformInfo.h?rev=312791&r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h (original)
> +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h Fri Sep 8
> 06:49:36 2017
> @@ -732,6 +732,8 @@ public:
> /// ((v0+v2), (v1+v3), undef, undef)
> int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
> bool IsPairwiseForm) const;
> + int getMinMaxReductionCost(Type *Ty, Type *CondTy, bool IsPairwiseForm,
> + bool IsUnsigned) const;
>
> /// \returns The cost of Intrinsic instructions. Analyses the real
> arguments.
> /// Three cases are handled: 1. scalar instruction 2. vector instruction
> @@ -998,6 +1000,8 @@ public:
> unsigned AddressSpace) = 0;
> virtual int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
> bool IsPairwiseForm) = 0;
> + virtual int getMinMaxReductionCost(Type *Ty, Type *CondTy,
> + bool IsPairwiseForm, bool
> IsUnsigned) = 0;
> virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
> ArrayRef<Type *> Tys, FastMathFlags FMF,
> unsigned ScalarizationCostPassed) = 0;
> @@ -1309,6 +1313,10 @@ public:
> bool IsPairwiseForm) override {
> return Impl.getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);
> }
> + int getMinMaxReductionCost(Type *Ty, Type *CondTy,
> + bool IsPairwiseForm, bool IsUnsigned)
> override {
> + return Impl.getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm,
> IsUnsigned);
> + }
> int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, ArrayRef<Type
> *> Tys,
> FastMathFlags FMF, unsigned ScalarizationCostPassed)
> override {
> return Impl.getIntrinsicInstrCost(ID, RetTy, Tys, FMF,
>
> Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/
> TargetTransformInfoImpl.h?rev=312791&r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h (original)
> +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h Fri Sep 8
> 06:49:36 2017
> @@ -451,6 +451,8 @@ public:
>
> unsigned getArithmeticReductionCost(unsigned, Type *, bool) { return
> 1; }
>
> + unsigned getMinMaxReductionCost(Type *, Type *, bool, bool) { return 1;
> }
> +
> unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) { return
> 0; }
>
> bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) {
>
> Modified: llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/
> llvm/CodeGen/BasicTTIImpl.h?rev=312791&r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h (original)
> +++ llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h Fri Sep 8 06:49:36
> 2017
> @@ -1166,6 +1166,66 @@ public:
> return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false,
> true);
> }
>
> + /// Try to calculate op costs for min/max reduction operations.
> + /// \param CondTy Conditional type for the Select instruction.
> + unsigned getMinMaxReductionCost(Type *Ty, Type *CondTy, bool IsPairwise,
> + bool) {
> + assert(Ty->isVectorTy() && "Expect a vector type");
> + Type *ScalarTy = Ty->getVectorElementType();
> + Type *ScalarCondTy = CondTy->getVectorElementType();
> + unsigned NumVecElts = Ty->getVectorNumElements();
> + unsigned NumReduxLevels = Log2_32(NumVecElts);
> + unsigned CmpOpcode;
> + if (Ty->isFPOrFPVectorTy()) {
> + CmpOpcode = Instruction::FCmp;
> + } else {
> + assert(Ty->isIntOrIntVectorTy() &&
> + "expecting floating point or integer type for min/max
> reduction");
> + CmpOpcode = Instruction::ICmp;
> + }
> + unsigned MinMaxCost = 0;
> + unsigned ShuffleCost = 0;
> + auto *ConcreteTTI = static_cast<T *>(this);
> + std::pair<unsigned, MVT> LT =
> + ConcreteTTI->getTLI()->getTypeLegalizationCost(DL, Ty);
> + unsigned LongVectorCount = 0;
> + unsigned MVTLen =
> + LT.second.isVector() ? LT.second.getVectorNumElements() : 1;
> + while (NumVecElts > MVTLen) {
> + NumVecElts /= 2;
> + // Assume the pairwise shuffles add a cost.
> + ShuffleCost += (IsPairwise + 1) *
> + ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector,
> Ty,
> + NumVecElts, Ty);
> + MinMaxCost +=
> + ConcreteTTI->getCmpSelInstrCost(CmpOpcode, Ty, CondTy,
> nullptr) +
> + ConcreteTTI->getCmpSelInstrCost(Instruction::Select, Ty,
> CondTy,
> + nullptr);
> + Ty = VectorType::get(ScalarTy, NumVecElts);
> + CondTy = VectorType::get(ScalarCondTy, NumVecElts);
> + ++LongVectorCount;
> + }
> + // The minimal length of the vector is limited by the real length of
> vector
> + // operations performed on the current platform. That's why several
> final
> + // reduction opertions are perfomed on the vectors with the same
> + // architecture-dependent length.
> + ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1) *
> + ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector,
> Ty,
> + NumVecElts, Ty);
> + MinMaxCost +=
> + (NumReduxLevels - LongVectorCount) *
> + (ConcreteTTI->getCmpSelInstrCost(CmpOpcode, Ty, CondTy, nullptr)
> +
> + ConcreteTTI->getCmpSelInstrCost(Instruction::Select, Ty, CondTy,
> + nullptr));
> + // Need 3 extractelement instructions for scalarization + an
> additional
> + // scalar select instruction.
> + return ShuffleCost + MinMaxCost +
> + 3 * getScalarizationOverhead(Ty, /*Insert=*/false,
> + /*Extract=*/true) +
> + ConcreteTTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
> + ScalarCondTy, nullptr);
> + }
> +
> unsigned getVectorSplitCost() { return 1; }
>
> /// @}
>
> Modified: llvm/trunk/lib/Analysis/CostModel.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/
> Analysis/CostModel.cpp?rev=312791&r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/lib/Analysis/CostModel.cpp (original)
> +++ llvm/trunk/lib/Analysis/CostModel.cpp Fri Sep 8 06:49:36 2017
> @@ -186,26 +186,56 @@ static bool matchPairwiseShuffleMask(Shu
> }
>
> namespace {
> +/// Kind of the reduction data.
> +enum ReductionKind {
> + RK_None, /// Not a reduction.
> + RK_Arithmetic, /// Binary reduction data.
> + RK_MinMax, /// Min/max reduction data.
> + RK_UnsignedMinMax, /// Unsigned min/max reduction data.
> +};
> /// Contains opcode + LHS/RHS parts of the reduction operations.
> struct ReductionData {
> - explicit ReductionData() = default;
> - ReductionData(unsigned Opcode, Value *LHS, Value *RHS)
> - : Opcode(Opcode), LHS(LHS), RHS(RHS) {}
> + ReductionData() = delete;
> + ReductionData(ReductionKind Kind, unsigned Opcode, Value *LHS, Value
> *RHS)
> + : Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind) {
> + assert(Kind != RK_None && "expected binary or min/max reduction
> only.");
> + }
> unsigned Opcode = 0;
> Value *LHS = nullptr;
> Value *RHS = nullptr;
> + ReductionKind Kind = RK_None;
> + bool hasSameData(ReductionData &RD) const {
> + return Kind == RD.Kind && Opcode == RD.Opcode;
> + }
> };
> } // namespace
>
> static Optional<ReductionData> getReductionData(Instruction *I) {
> Value *L, *R;
> if (m_BinOp(m_Value(L), m_Value(R)).match(I))
> - return ReductionData(I->getOpcode(), L, R);
> + return ReductionData(RK_Arithmetic, I->getOpcode(), L, R);
> + if (auto *SI = dyn_cast<SelectInst>(I)) {
> + if (m_SMin(m_Value(L), m_Value(R)).match(SI) ||
> + m_SMax(m_Value(L), m_Value(R)).match(SI) ||
> + m_OrdFMin(m_Value(L), m_Value(R)).match(SI) ||
> + m_OrdFMax(m_Value(L), m_Value(R)).match(SI) ||
> + m_UnordFMin(m_Value(L), m_Value(R)).match(SI) ||
> + m_UnordFMax(m_Value(L), m_Value(R)).match(SI)) {
> + auto *CI = cast<CmpInst>(SI->getCondition());
> + return ReductionData(RK_MinMax, CI->getOpcode(), L, R);
> + }
> + if (m_UMin(m_Value(L), m_Value(R)).match(SI) ||
> + m_UMax(m_Value(L), m_Value(R)).match(SI)) {
> + auto *CI = cast<CmpInst>(SI->getCondition());
> + return ReductionData(RK_UnsignedMinMax, CI->getOpcode(), L, R);
> + }
> + }
> return llvm::None;
> }
>
> -static bool matchPairwiseReductionAtLevel(Instruction *I, unsigned Level,
> - unsigned NumLevels) {
> +static ReductionKind matchPairwiseReductionAtLevel(Instruction *I,
> + unsigned Level,
> + unsigned NumLevels) {
> // Match one level of pairwise operations.
> // %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
> // <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
> @@ -213,24 +243,24 @@ static bool matchPairwiseReductionAtLeve
> // <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
> // %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
> if (!I)
> - return false;
> + return RK_None;
>
> assert(I->getType()->isVectorTy() && "Expecting a vector type");
>
> Optional<ReductionData> RD = getReductionData(I);
> if (!RD)
> - return false;
> + return RK_None;
>
> ShuffleVectorInst *LS = dyn_cast<ShuffleVectorInst>(RD->LHS);
> if (!LS && Level)
> - return false;
> + return RK_None;
> ShuffleVectorInst *RS = dyn_cast<ShuffleVectorInst>(RD->RHS);
> if (!RS && Level)
> - return false;
> + return RK_None;
>
> // On level 0 we can omit one shufflevector instruction.
> if (!Level && !RS && !LS)
> - return false;
> + return RK_None;
>
> // Shuffle inputs must match.
> Value *NextLevelOpL = LS ? LS->getOperand(0) : nullptr;
> @@ -239,7 +269,7 @@ static bool matchPairwiseReductionAtLeve
> if (NextLevelOpR && NextLevelOpL) {
> // If we have two shuffles their operands must match.
> if (NextLevelOpL != NextLevelOpR)
> - return false;
> + return RK_None;
>
> NextLevelOp = NextLevelOpL;
> } else if (Level == 0 && (NextLevelOpR || NextLevelOpL)) {
> @@ -250,45 +280,47 @@ static bool matchPairwiseReductionAtLeve
> // %NextLevelOpL = shufflevector %R, <1, undef ...>
> // %BinOp = fadd %NextLevelOpL, %R
> if (NextLevelOpL && NextLevelOpL != RD->RHS)
> - return false;
> + return RK_None;
> else if (NextLevelOpR && NextLevelOpR != RD->LHS)
> - return false;
> + return RK_None;
>
> NextLevelOp = NextLevelOpL ? RD->RHS : RD->LHS;
> - } else
> - return false;
> + } else {
> + return RK_None;
> + }
>
> // Check that the next levels binary operation exists and matches with
> the
> // current one.
> if (Level + 1 != NumLevels) {
> Optional<ReductionData> NextLevelRD =
> getReductionData(cast<Instruction>(NextLevelOp));
> - if (!NextLevelRD || RD->Opcode != NextLevelRD->Opcode)
> - return false;
> + if (!NextLevelRD || !RD->hasSameData(*NextLevelRD))
> + return RK_None;
> }
>
> // Shuffle mask for pairwise operation must match.
> if (matchPairwiseShuffleMask(LS, /*IsLeft=*/true, Level)) {
> if (!matchPairwiseShuffleMask(RS, /*IsLeft=*/false, Level))
> - return false;
> + return RK_None;
> } else if (matchPairwiseShuffleMask(RS, /*IsLeft=*/true, Level)) {
> if (!matchPairwiseShuffleMask(LS, /*IsLeft=*/false, Level))
> - return false;
> - } else
> - return false;
> + return RK_None;
> + } else {
> + return RK_None;
> + }
>
> if (++Level == NumLevels)
> - return true;
> + return RD->Kind;
>
> // Match next level.
> return matchPairwiseReductionAtLevel(cast<Instruction>(NextLevelOp),
> Level,
> NumLevels);
> }
>
> -static bool matchPairwiseReduction(const ExtractElementInst *ReduxRoot,
> - unsigned &Opcode, Type *&Ty) {
> +static ReductionKind matchPairwiseReduction(const ExtractElementInst
> *ReduxRoot,
> + unsigned &Opcode, Type *&Ty) {
> if (!EnableReduxCost)
> - return false;
> + return RK_None;
>
> // Need to extract the first element.
> ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
> @@ -296,19 +328,19 @@ static bool matchPairwiseReduction(const
> if (CI)
> Idx = CI->getZExtValue();
> if (Idx != 0)
> - return false;
> + return RK_None;
>
> auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
> if (!RdxStart)
> - return false;
> + return RK_None;
> Optional<ReductionData> RD = getReductionData(RdxStart);
> if (!RD)
> - return false;
> + return RK_None;
>
> Type *VecTy = RdxStart->getType();
> unsigned NumVecElems = VecTy->getVectorNumElements();
> if (!isPowerOf2_32(NumVecElems))
> - return false;
> + return RK_None;
>
> // We look for a sequence of shuffle,shuffle,add triples like the
> following
> // that builds a pairwise reduction tree.
> @@ -328,13 +360,14 @@ static bool matchPairwiseReduction(const
> // <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
> // %bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
> // %r = extractelement <4 x float> %bin.rdx8, i32 0
> - if (!matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)))
> - return false;
> + if (matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)) ==
> + RK_None)
> + return RK_None;
>
> Opcode = RD->Opcode;
> Ty = VecTy;
>
> - return true;
> + return RD->Kind;
> }
>
> static std::pair<Value *, ShuffleVectorInst *>
> @@ -348,10 +381,11 @@ getShuffleAndOtherOprd(Value *L, Value *
> return std::make_pair(L, S);
> }
>
> -static bool matchVectorSplittingReduction(const ExtractElementInst
> *ReduxRoot,
> - unsigned &Opcode, Type *&Ty) {
> +static ReductionKind
> +matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,
> + unsigned &Opcode, Type *&Ty) {
> if (!EnableReduxCost)
> - return false;
> + return RK_None;
>
> // Need to extract the first element.
> ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
> @@ -359,19 +393,19 @@ static bool matchVectorSplittingReductio
> if (CI)
> Idx = CI->getZExtValue();
> if (Idx != 0)
> - return false;
> + return RK_None;
>
> auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
> if (!RdxStart)
> - return false;
> + return RK_None;
> Optional<ReductionData> RD = getReductionData(RdxStart);
> if (!RD)
> - return false;
> + return RK_None;
>
> Type *VecTy = ReduxRoot->getOperand(0)->getType();
> unsigned NumVecElems = VecTy->getVectorNumElements();
> if (!isPowerOf2_32(NumVecElems))
> - return false;
> + return RK_None;
>
> // We look for a sequence of shuffles and adds like the following
> matching one
> // fadd, shuffle vector pair at a time.
> @@ -391,10 +425,10 @@ static bool matchVectorSplittingReductio
> while (NumVecElemsRemain - 1) {
> // Check for the right reduction operation.
> if (!RdxOp)
> - return false;
> + return RK_None;
> Optional<ReductionData> RDLevel = getReductionData(RdxOp);
> - if (!RDLevel || RDLevel->Opcode != RD->Opcode)
> - return false;
> + if (!RDLevel || !RDLevel->hasSameData(*RD))
> + return RK_None;
>
> Value *NextRdxOp;
> ShuffleVectorInst *Shuffle;
> @@ -403,9 +437,9 @@ static bool matchVectorSplittingReductio
>
> // Check the current reduction operation and the shuffle use the same
> value.
> if (Shuffle == nullptr)
> - return false;
> + return RK_None;
> if (Shuffle->getOperand(0) != NextRdxOp)
> - return false;
> + return RK_None;
>
> // Check that shuffle masks matches.
> for (unsigned j = 0; j != MaskStart; ++j)
> @@ -415,7 +449,7 @@ static bool matchVectorSplittingReductio
>
> SmallVector<int, 16> Mask = Shuffle->getShuffleMask();
> if (ShuffleMask != Mask)
> - return false;
> + return RK_None;
>
> RdxOp = dyn_cast<Instruction>(NextRdxOp);
> NumVecElemsRemain /= 2;
> @@ -424,7 +458,7 @@ static bool matchVectorSplittingReductio
>
> Opcode = RD->Opcode;
> Ty = VecTy;
> - return true;
> + return RD->Kind;
> }
>
> unsigned CostModelAnalysis::getInstructionCost(const Instruction *I)
> const {
> @@ -519,13 +553,36 @@ unsigned CostModelAnalysis::getInstructi
> unsigned ReduxOpCode;
> Type *ReduxType;
>
> - if (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {
> + switch (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {
> + case RK_Arithmetic:
> return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,
> /*IsPairwiseForm=*/false);
> + case RK_MinMax:
> + return TTI->getMinMaxReductionCost(
> + ReduxType, CmpInst::makeCmpResultType(ReduxType),
> + /*IsPairwiseForm=*/false, /*IsUnsigned=*/false);
> + case RK_UnsignedMinMax:
> + return TTI->getMinMaxReductionCost(
> + ReduxType, CmpInst::makeCmpResultType(ReduxType),
> + /*IsPairwiseForm=*/false, /*IsUnsigned=*/true);
> + case RK_None:
> + break;
> }
> - if (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
> +
> + switch (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
> + case RK_Arithmetic:
> return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,
> /*IsPairwiseForm=*/true);
> + case RK_MinMax:
> + return TTI->getMinMaxReductionCost(
> + ReduxType, CmpInst::makeCmpResultType(ReduxType),
> + /*IsPairwiseForm=*/true, /*IsUnsigned=*/false);
> + case RK_UnsignedMinMax:
> + return TTI->getMinMaxReductionCost(
> + ReduxType, CmpInst::makeCmpResultType(ReduxType),
> + /*IsPairwiseForm=*/true, /*IsUnsigned=*/true);
> + case RK_None:
> + break;
> }
>
> return TTI->getVectorInstrCost(I->getOpcode(),
>
> Modified: llvm/trunk/lib/Analysis/TargetTransformInfo.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/
> Analysis/TargetTransformInfo.cpp?rev=312791&r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/lib/Analysis/TargetTransformInfo.cpp (original)
> +++ llvm/trunk/lib/Analysis/TargetTransformInfo.cpp Fri Sep 8 06:49:36
> 2017
> @@ -484,6 +484,15 @@ int TargetTransformInfo::getArithmeticRe
> return Cost;
> }
>
> +int TargetTransformInfo::getMinMaxReductionCost(Type *Ty, Type *CondTy,
> + bool IsPairwiseForm,
> + bool IsUnsigned) const {
> + int Cost =
> + TTIImpl->getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm,
> IsUnsigned);
> + assert(Cost >= 0 && "TTI should not produce negative costs!");
> + return Cost;
> +}
> +
> unsigned
> TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys)
> const {
> return TTIImpl->getCostOfKeepingLiveOverCall(Tys);
>
> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/
> X86/X86TargetTransformInfo.cpp?rev=312791&r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Fri Sep 8
> 06:49:36 2017
> @@ -1999,6 +1999,152 @@ int X86TTIImpl::getArithmeticReductionCo
> return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwise);
> }
>
> +int X86TTIImpl::getMinMaxReductionCost(Type *ValTy, Type *CondTy,
> + bool IsPairwise, bool IsUnsigned) {
> + std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
> +
> + MVT MTy = LT.second;
> +
> + int ISD;
> + if (ValTy->isIntOrIntVectorTy()) {
> + ISD = IsUnsigned ? ISD::UMIN : ISD::SMIN;
> + } else {
> + assert(ValTy->isFPOrFPVectorTy() &&
> + "Expected float point or integer vector type.");
> + ISD = ISD::FMINNUM;
> + }
> +
> + // We use the Intel Architecture Code Analyzer(IACA) to measure the
> throughput
> + // and make it as the cost.
> +
> + static const CostTblEntry SSE42CostTblPairWise[] = {
> + {ISD::FMINNUM, MVT::v2f64, 3},
> + {ISD::FMINNUM, MVT::v4f32, 2},
> + {ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is
> "6.8"
> + {ISD::UMIN, MVT::v2i64, 8}, // The data reported by the IACA is
> "8.6"
> + {ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is
> "1.5"
> + {ISD::UMIN, MVT::v4i32, 2}, // The data reported by the IACA is
> "1.8"
> + {ISD::SMIN, MVT::v8i16, 2},
> + {ISD::UMIN, MVT::v8i16, 2},
> + };
> +
> + static const CostTblEntry AVX1CostTblPairWise[] = {
> + {ISD::FMINNUM, MVT::v4f32, 1},
> + {ISD::FMINNUM, MVT::v4f64, 1},
> + {ISD::FMINNUM, MVT::v8f32, 2},
> + {ISD::SMIN, MVT::v2i64, 3},
> + {ISD::UMIN, MVT::v2i64, 3},
> + {ISD::SMIN, MVT::v4i32, 1},
> + {ISD::UMIN, MVT::v4i32, 1},
> + {ISD::SMIN, MVT::v8i16, 1},
> + {ISD::UMIN, MVT::v8i16, 1},
> + {ISD::SMIN, MVT::v8i32, 3},
> + {ISD::UMIN, MVT::v8i32, 3},
> + };
> +
> + static const CostTblEntry AVX2CostTblPairWise[] = {
> + {ISD::SMIN, MVT::v4i64, 2},
> + {ISD::UMIN, MVT::v4i64, 2},
> + {ISD::SMIN, MVT::v8i32, 1},
> + {ISD::UMIN, MVT::v8i32, 1},
> + {ISD::SMIN, MVT::v16i16, 1},
> + {ISD::UMIN, MVT::v16i16, 1},
> + {ISD::SMIN, MVT::v32i8, 2},
> + {ISD::UMIN, MVT::v32i8, 2},
> + };
> +
> + static const CostTblEntry AVX512CostTblPairWise[] = {
> + {ISD::FMINNUM, MVT::v8f64, 1},
> + {ISD::FMINNUM, MVT::v16f32, 2},
> + {ISD::SMIN, MVT::v8i64, 2},
> + {ISD::UMIN, MVT::v8i64, 2},
> + {ISD::SMIN, MVT::v16i32, 1},
> + {ISD::UMIN, MVT::v16i32, 1},
> + };
> +
> + static const CostTblEntry SSE42CostTblNoPairWise[] = {
> + {ISD::FMINNUM, MVT::v2f64, 3},
> + {ISD::FMINNUM, MVT::v4f32, 3},
> + {ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is
> "6.8"
> + {ISD::UMIN, MVT::v2i64, 9}, // The data reported by the IACA is
> "8.6"
> + {ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is
> "1.5"
> + {ISD::UMIN, MVT::v4i32, 2}, // The data reported by the IACA is
> "1.8"
> + {ISD::SMIN, MVT::v8i16, 1}, // The data reported by the IACA is
> "1.5"
> + {ISD::UMIN, MVT::v8i16, 2}, // The data reported by the IACA is
> "1.8"
> + };
> +
> + static const CostTblEntry AVX1CostTblNoPairWise[] = {
> + {ISD::FMINNUM, MVT::v4f32, 1},
> + {ISD::FMINNUM, MVT::v4f64, 1},
> + {ISD::FMINNUM, MVT::v8f32, 1},
> + {ISD::SMIN, MVT::v2i64, 3},
> + {ISD::UMIN, MVT::v2i64, 3},
> + {ISD::SMIN, MVT::v4i32, 1},
> + {ISD::UMIN, MVT::v4i32, 1},
> + {ISD::SMIN, MVT::v8i16, 1},
> + {ISD::UMIN, MVT::v8i16, 1},
> + {ISD::SMIN, MVT::v8i32, 2},
> + {ISD::UMIN, MVT::v8i32, 2},
> + };
> +
> + static const CostTblEntry AVX2CostTblNoPairWise[] = {
> + {ISD::SMIN, MVT::v4i64, 1},
> + {ISD::UMIN, MVT::v4i64, 1},
> + {ISD::SMIN, MVT::v8i32, 1},
> + {ISD::UMIN, MVT::v8i32, 1},
> + {ISD::SMIN, MVT::v16i16, 1},
> + {ISD::UMIN, MVT::v16i16, 1},
> + {ISD::SMIN, MVT::v32i8, 1},
> + {ISD::UMIN, MVT::v32i8, 1},
> + };
> +
> + static const CostTblEntry AVX512CostTblNoPairWise[] = {
> + {ISD::FMINNUM, MVT::v8f64, 1},
> + {ISD::FMINNUM, MVT::v16f32, 2},
> + {ISD::SMIN, MVT::v8i64, 1},
> + {ISD::UMIN, MVT::v8i64, 1},
> + {ISD::SMIN, MVT::v16i32, 1},
> + {ISD::UMIN, MVT::v16i32, 1},
> + };
> +
> + if (IsPairwise) {
> + if (ST->hasAVX512())
> + if (const auto *Entry = CostTableLookup(AVX512CostTblPairWise,
> ISD, MTy))
> + return LT.first * Entry->Cost;
> +
> + if (ST->hasAVX2())
> + if (const auto *Entry = CostTableLookup(AVX2CostTblPairWise, ISD,
> MTy))
> + return LT.first * Entry->Cost;
> +
> + if (ST->hasAVX())
> + if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD,
> MTy))
> + return LT.first * Entry->Cost;
> +
> + if (ST->hasSSE42())
> + if (const auto *Entry = CostTableLookup(SSE42CostTblPairWise, ISD,
> MTy))
> + return LT.first * Entry->Cost;
> + } else {
> + if (ST->hasAVX512())
> + if (const auto *Entry =
> + CostTableLookup(AVX512CostTblNoPairWise, ISD, MTy))
> + return LT.first * Entry->Cost;
> +
> + if (ST->hasAVX2())
> + if (const auto *Entry = CostTableLookup(AVX2CostTblNoPairWise,
> ISD, MTy))
> + return LT.first * Entry->Cost;
> +
> + if (ST->hasAVX())
> + if (const auto *Entry = CostTableLookup(AVX1CostTblNoPairWise,
> ISD, MTy))
> + return LT.first * Entry->Cost;
> +
> + if (ST->hasSSE42())
> + if (const auto *Entry = CostTableLookup(SSE42CostTblNoPairWise,
> ISD, MTy))
> + return LT.first * Entry->Cost;
> + }
> +
> + return BaseT::getMinMaxReductionCost(ValTy, CondTy, IsPairwise,
> IsUnsigned);
> +}
> +
> /// \brief Calculate the cost of materializing a 64-bit value. This helper
> /// method might only calculate a fraction of a larger immediate.
> Therefore it
> /// is valid to return a cost of ZERO.
>
> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/
> X86/X86TargetTransformInfo.h?rev=312791&r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h (original)
> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h Fri Sep 8
> 06:49:36 2017
> @@ -96,6 +96,9 @@ public:
> int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
> bool IsPairwiseForm);
>
> + int getMinMaxReductionCost(Type *Ty, Type *CondTy, bool IsPairwiseForm,
> + bool IsUnsigned);
> +
> int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
> unsigned Factor, ArrayRef<unsigned>
> Indices,
> unsigned Alignment, unsigned
> AddressSpace);
>
> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/
> Transforms/Vectorize/SLPVectorizer.cpp?rev=312791&
> r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)
> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Sep 8
> 06:49:36 2017
> @@ -4627,11 +4627,17 @@ class HorizontalReduction {
> // Use map vector to make stable output.
> MapVector<Instruction *, Value *> ExtraArgs;
>
> + /// Kind of the reduction data.
> + enum ReductionKind {
> + RK_None, /// Not a reduction.
> + RK_Arithmetic, /// Binary reduction data.
> + RK_Min, /// Minimum reduction data.
> + RK_UMin, /// Unsigned minimum reduction data.
> + RK_Max, /// Maximum reduction data.
> + RK_UMax, /// Unsigned maximum reduction data.
> + };
> /// Contains info about operation, like its opcode, left and right
> operands.
> - struct OperationData {
> - /// true if the operation is a reduced value, false if reduction
> operation.
> - bool IsReducedValue = false;
> -
> + class OperationData {
> /// Opcode of the instruction.
> unsigned Opcode = 0;
>
> @@ -4640,12 +4646,21 @@ class HorizontalReduction {
>
> /// Right operand of the reduction operation.
> Value *RHS = nullptr;
> + /// Kind of the reduction operation.
> + ReductionKind Kind = RK_None;
> + /// True if float point min/max reduction has no NaNs.
> + bool NoNaN = false;
>
> /// Checks if the reduction operation can be vectorized.
> bool isVectorizable() const {
> return LHS && RHS &&
> - // We currently only support adds.
> - (Opcode == Instruction::Add || Opcode == Instruction::FAdd);
> + // We currently only support adds && min/max reductions.
> + ((Kind == RK_Arithmetic &&
> + (Opcode == Instruction::Add || Opcode ==
> Instruction::FAdd)) ||
> + ((Opcode == Instruction::ICmp || Opcode ==
> Instruction::FCmp) &&
> + (Kind == RK_Min || Kind == RK_Max)) ||
> + (Opcode == Instruction::ICmp &&
> + (Kind == RK_UMin || Kind == RK_UMax)));
> }
>
> public:
> @@ -4653,43 +4668,90 @@ class HorizontalReduction {
>
> /// Construction for reduced values. They are identified by opcode
> only and
> /// don't have associated LHS/RHS values.
> - explicit OperationData(Value *V) : IsReducedValue(true) {
> + explicit OperationData(Value *V) : Kind(RK_None) {
> if (auto *I = dyn_cast<Instruction>(V))
> Opcode = I->getOpcode();
> }
>
> - /// Constructor for binary reduction operations with opcode and its
> left and
> + /// Constructor for reduction operations with opcode and its left and
> /// right operands.
> - OperationData(unsigned Opcode, Value *LHS, Value *RHS)
> - : Opcode(Opcode), LHS(LHS), RHS(RHS) {}
> -
> + OperationData(unsigned Opcode, Value *LHS, Value *RHS, ReductionKind
> Kind,
> + bool NoNaN = false)
> + : Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind), NoNaN(NoNaN) {
> + assert(Kind != RK_None && "One of the reduction operations is
> expected.");
> + }
> explicit operator bool() const { return Opcode; }
>
> /// Get the index of the first operand.
> unsigned getFirstOperandIndex() const {
> assert(!!*this && "The opcode is not set.");
> + switch (Kind) {
> + case RK_Min:
> + case RK_UMin:
> + case RK_Max:
> + case RK_UMax:
> + return 1;
> + case RK_Arithmetic:
> + case RK_None:
> + break;
> + }
> return 0;
> }
>
> /// Total number of operands in the reduction operation.
> unsigned getNumberOfOperands() const {
> - assert(!IsReducedValue && !!*this && LHS && RHS &&
> + assert(Kind != RK_None && !!*this && LHS && RHS &&
> "Expected reduction operation.");
> - return 2;
> + switch (Kind) {
> + case RK_Arithmetic:
> + return 2;
> + case RK_Min:
> + case RK_UMin:
> + case RK_Max:
> + case RK_UMax:
> + return 3;
> + case RK_None:
> + llvm_unreachable("Reduction kind is not set");
> + }
> }
>
> /// Expected number of uses for reduction operations/reduced values.
> unsigned getRequiredNumberOfUses() const {
> - assert(!IsReducedValue && !!*this && LHS && RHS &&
> + assert(Kind != RK_None && !!*this && LHS && RHS &&
> "Expected reduction operation.");
> - return 1;
> + switch (Kind) {
> + case RK_Arithmetic:
> + return 1;
> + case RK_Min:
> + case RK_UMin:
> + case RK_Max:
> + case RK_UMax:
> + return 2;
> + case RK_None:
> + llvm_unreachable("Reduction kind is not set");
> + }
> }
>
> /// Checks if instruction is associative and can be vectorized.
> bool isAssociative(Instruction *I) const {
> - assert(!IsReducedValue && *this && LHS && RHS &&
> + assert(Kind != RK_None && *this && LHS && RHS &&
> "Expected reduction operation.");
> - return I->isAssociative();
> + switch (Kind) {
> + case RK_Arithmetic:
> + return I->isAssociative();
> + case RK_Min:
> + case RK_Max:
> + return Opcode == Instruction::ICmp ||
> + cast<Instruction>(I->getOperand(0))->hasUnsafeAlgebra();
> + case RK_UMin:
> + case RK_UMax:
> + assert(Opcode == Instruction::ICmp &&
> + "Only integer compare operation is expected.");
> + return true;
> + case RK_None:
> + break;
> + }
> + llvm_unreachable("Reduction kind is not set");
> }
>
> /// Checks if the reduction operation can be vectorized.
> @@ -4700,18 +4762,17 @@ class HorizontalReduction {
> /// Checks if two operation data are both a reduction op or both a
> reduced
> /// value.
> bool operator==(const OperationData &OD) {
> - assert(((IsReducedValue != OD.IsReducedValue) ||
> - ((!LHS == !OD.LHS) && (!RHS == !OD.RHS))) &&
> + assert(((Kind != OD.Kind) || ((!LHS == !OD.LHS) && (!RHS ==
> !OD.RHS))) &&
> "One of the comparing operations is incorrect.");
> - return this == &OD ||
> - (IsReducedValue == OD.IsReducedValue && Opcode == OD.Opcode);
> + return this == &OD || (Kind == OD.Kind && Opcode == OD.Opcode);
> }
> bool operator!=(const OperationData &OD) { return !(*this == OD); }
> void clear() {
> - IsReducedValue = false;
> Opcode = 0;
> LHS = nullptr;
> RHS = nullptr;
> + Kind = RK_None;
> + NoNaN = false;
> }
>
> /// Get the opcode of the reduction operation.
> @@ -4720,16 +4781,81 @@ class HorizontalReduction {
> return Opcode;
> }
>
> + /// Get kind of reduction data.
> + ReductionKind getKind() const { return Kind; }
> Value *getLHS() const { return LHS; }
> Value *getRHS() const { return RHS; }
> + Type *getConditionType() const {
> + switch (Kind) {
> + case RK_Arithmetic:
> + return nullptr;
> + case RK_Min:
> + case RK_Max:
> + case RK_UMin:
> + case RK_UMax:
> + return CmpInst::makeCmpResultType(LHS->getType());
> + case RK_None:
> + break;
> + }
> + llvm_unreachable("Reduction kind is not set");
> + }
>
> /// Creates reduction operation with the current opcode.
> Value *createOp(IRBuilder<> &Builder, const Twine &Name = "") const {
> - assert(!IsReducedValue &&
> - (Opcode == Instruction::FAdd || Opcode == Instruction::Add)
> &&
> - "Expected add|fadd reduction operation.");
> - return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS,
> RHS,
> - Name);
> + assert(isVectorizable() &&
> + "Expected add|fadd or min/max reduction operation.");
> + Value *Cmp;
> + switch (Kind) {
> + case RK_Arithmetic:
> + return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS,
> RHS,
> + Name);
> + case RK_Min:
> + Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSLT(LHS,
> RHS)
> + : Builder.CreateFCmpOLT(LHS,
> RHS);
> + break;
> + case RK_Max:
> + Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSGT(LHS,
> RHS)
> + : Builder.CreateFCmpOGT(LHS,
> RHS);
> + break;
> + case RK_UMin:
> + assert(Opcode == Instruction::ICmp && "Expected integer types.");
> + Cmp = Builder.CreateICmpULT(LHS, RHS);
> + break;
> + case RK_UMax:
> + assert(Opcode == Instruction::ICmp && "Expected integer types.");
> + Cmp = Builder.CreateICmpUGT(LHS, RHS);
> + break;
> + case RK_None:
> + llvm_unreachable("Unknown reduction operation.");
> + }
> + return Builder.CreateSelect(Cmp, LHS, RHS, Name);
> + }
> + TargetTransformInfo::ReductionFlags getFlags() const {
> + TargetTransformInfo::ReductionFlags Flags;
> + Flags.NoNaN = NoNaN;
> + switch (Kind) {
> + case RK_Arithmetic:
> + break;
> + case RK_Min:
> + Flags.IsSigned = Opcode == Instruction::ICmp;
> + Flags.IsMaxOp = false;
> + break;
> + case RK_Max:
> + Flags.IsSigned = Opcode == Instruction::ICmp;
> + Flags.IsMaxOp = true;
> + break;
> + case RK_UMin:
> + Flags.IsSigned = false;
> + Flags.IsMaxOp = false;
> + break;
> + case RK_UMax:
> + Flags.IsSigned = false;
> + Flags.IsMaxOp = true;
> + break;
> + case RK_None:
> + llvm_unreachable("Reduction kind is not set");
> + }
> + return Flags;
> }
> };
>
> @@ -4771,8 +4897,32 @@ class HorizontalReduction {
>
> Value *LHS;
> Value *RHS;
> - if (m_BinOp(m_Value(LHS), m_Value(RHS)).match(V))
> - return OperationData(cast<BinaryOperator>(V)->getOpcode(), LHS,
> RHS);
> + if (m_BinOp(m_Value(LHS), m_Value(RHS)).match(V)) {
> + return OperationData(cast<BinaryOperator>(V)->getOpcode(), LHS,
> RHS,
> + RK_Arithmetic);
> + }
> + if (auto *Select = dyn_cast<SelectInst>(V)) {
> + // Look for a min/max pattern.
> + if (m_UMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
> + return OperationData(Instruction::ICmp, LHS, RHS, RK_UMin);
> + } else if (m_SMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
> + return OperationData(Instruction::ICmp, LHS, RHS, RK_Min);
> + } else if (m_OrdFMin(m_Value(LHS), m_Value(RHS)).match(Select) ||
> + m_UnordFMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
> + return OperationData(
> + Instruction::FCmp, LHS, RHS, RK_Min,
> + cast<Instruction>(Select->getCondition())->hasNoNaNs());
> + } else if (m_UMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
> + return OperationData(Instruction::ICmp, LHS, RHS, RK_UMax);
> + } else if (m_SMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
> + return OperationData(Instruction::ICmp, LHS, RHS, RK_Max);
> + } else if (m_OrdFMax(m_Value(LHS), m_Value(RHS)).match(Select) ||
> + m_UnordFMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
> + return OperationData(
> + Instruction::FCmp, LHS, RHS, RK_Max,
> + cast<Instruction>(Select->getCondition())->hasNoNaNs());
> + }
> + }
> return OperationData(V);
> }
>
> @@ -4965,8 +5115,9 @@ public:
> if (VectorizedTree) {
> Builder.SetCurrentDebugLocation(Loc);
> OperationData VectReductionData(ReductionData.getOpcode(),
> - VectorizedTree, ReducedSubTree);
> - VectorizedTree = VectReductionData.createOp(Builder, "bin.rdx");
> + VectorizedTree, ReducedSubTree,
> + ReductionData.getKind());
> + VectorizedTree = VectReductionData.createOp(Builder, "op.rdx");
> propagateIRFlags(VectorizedTree, ReductionOps);
> } else
> VectorizedTree = ReducedSubTree;
> @@ -4980,7 +5131,8 @@ public:
> auto *I = cast<Instruction>(ReducedVals[i]);
> Builder.SetCurrentDebugLocation(I->getDebugLoc());
> OperationData VectReductionData(ReductionData.getOpcode(),
> - VectorizedTree, I);
> + VectorizedTree, I,
> + ReductionData.getKind());
> VectorizedTree = VectReductionData.createOp(Builder);
> propagateIRFlags(VectorizedTree, ReductionOps);
> }
> @@ -4991,8 +5143,9 @@ public:
> for (auto *I : Pair.second) {
> Builder.SetCurrentDebugLocation(I->getDebugLoc());
> OperationData VectReductionData(ReductionData.getOpcode(),
> - VectorizedTree, Pair.first);
> - VectorizedTree = VectReductionData.createOp(Builder,
> "bin.extra");
> + VectorizedTree, Pair.first,
> + ReductionData.getKind());
> + VectorizedTree = VectReductionData.createOp(Builder,
> "op.extra");
> propagateIRFlags(VectorizedTree, I);
> }
> }
> @@ -5013,19 +5166,58 @@ private:
> Type *ScalarTy = FirstReducedVal->getType();
> Type *VecTy = VectorType::get(ScalarTy, ReduxWidth);
>
> - int PairwiseRdxCost =
> - TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,
> - /*IsPairwiseForm=*/true);
> - int SplittingRdxCost =
> - TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,
> - /*IsPairwiseForm=*/false);
> + int PairwiseRdxCost;
> + int SplittingRdxCost;
> + bool IsUnsigned = true;
> + switch (ReductionData.getKind()) {
> + case RK_Arithmetic:
> + PairwiseRdxCost =
> + TTI->getArithmeticReductionCost(ReductionData.getOpcode(),
> VecTy,
> + /*IsPairwiseForm=*/true);
> + SplittingRdxCost =
> + TTI->getArithmeticReductionCost(ReductionData.getOpcode(),
> VecTy,
> + /*IsPairwiseForm=*/false);
> + break;
> + case RK_Min:
> + case RK_Max:
> + IsUnsigned = false;
> + case RK_UMin:
> + case RK_UMax: {
> + Type *VecCondTy = CmpInst::makeCmpResultType(VecTy);
> + PairwiseRdxCost =
> + TTI->getMinMaxReductionCost(VecTy, VecCondTy,
> + /*IsPairwiseForm=*/true,
> IsUnsigned);
> + SplittingRdxCost =
> + TTI->getMinMaxReductionCost(VecTy, VecCondTy,
> + /*IsPairwiseForm=*/false,
> IsUnsigned);
> + break;
> + }
> + case RK_None:
> + llvm_unreachable("Expected arithmetic or min/max reduction
> operation");
> + }
>
> IsPairwiseReduction = PairwiseRdxCost < SplittingRdxCost;
> int VecReduxCost = IsPairwiseReduction ? PairwiseRdxCost :
> SplittingRdxCost;
>
> - int ScalarReduxCost =
> - (ReduxWidth - 1) *
> - TTI->getArithmeticInstrCost(ReductionData.getOpcode(), ScalarTy);
> + int ScalarReduxCost;
> + switch (ReductionData.getKind()) {
> + case RK_Arithmetic:
> + ScalarReduxCost =
> + TTI->getArithmeticInstrCost(ReductionData.getOpcode(),
> ScalarTy);
> + break;
> + case RK_Min:
> + case RK_Max:
> + case RK_UMin:
> + case RK_UMax:
> + ScalarReduxCost =
> + TTI->getCmpSelInstrCost(ReductionData.getOpcode(), ScalarTy) +
> + TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
> + CmpInst::makeCmpResultType(ScalarTy));
> + break;
> + case RK_None:
> + llvm_unreachable("Expected arithmetic or min/max reduction
> operation");
> + }
> + ScalarReduxCost *= (ReduxWidth - 1);
>
> DEBUG(dbgs() << "SLP: Adding cost " << VecReduxCost - ScalarReduxCost
> << " for reduction that starts with " << *FirstReducedVal
> @@ -5047,7 +5239,7 @@ private:
> if (!IsPairwiseReduction)
> return createSimpleTargetReduction(
> Builder, TTI, ReductionData.getOpcode(), VectorizedValue,
> - TargetTransformInfo::ReductionFlags(), RedOps);
> + ReductionData.getFlags(), RedOps);
>
> Value *TmpVec = VectorizedValue;
> for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
> @@ -5062,8 +5254,8 @@ private:
> TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
> "rdx.shuf.r");
> OperationData VectReductionData(ReductionData.getOpcode(),
> LeftShuf,
> - RightShuf);
> - TmpVec = VectReductionData.createOp(Builder, "bin.rdx");
> + RightShuf, ReductionData.getKind());
> + TmpVec = VectReductionData.createOp(Builder, "op.rdx");
> propagateIRFlags(TmpVec, RedOps);
> }
>
> @@ -5224,9 +5416,11 @@ static bool tryToVectorizeHorReductionOr
> auto *Inst = dyn_cast<Instruction>(V);
> if (!Inst)
> continue;
> - if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {
> + auto *BI = dyn_cast<BinaryOperator>(Inst);
> + auto *SI = dyn_cast<SelectInst>(Inst);
> + if (BI || SI) {
> HorizontalReduction HorRdx;
> - if (HorRdx.matchAssociativeReduction(P, BI)) {
> + if (HorRdx.matchAssociativeReduction(P, Inst)) {
> if (HorRdx.tryToReduce(R, TTI)) {
> Res = true;
> // Set P to nullptr to avoid re-analysis of phi node in
> @@ -5235,7 +5429,7 @@ static bool tryToVectorizeHorReductionOr
> continue;
> }
> }
> - if (P) {
> + if (P && BI) {
> Inst = dyn_cast<Instruction>(BI->getOperand(0));
> if (Inst == P)
> Inst = dyn_cast<Instruction>(BI->getOperand(1));
>
> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/SLPVectorizer/X86/horizontal-list.ll?rev=312791&
> r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
> (original)
> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll Fri
> Sep 8 06:49:36 2017
> @@ -117,11 +117,11 @@ define float @bazz() {
> ; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> ; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]],
> i32 0
> -; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
> -; CHECK-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
> [[CONV6]]
> +; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
> +; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
> [[CONV6]]
> ; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
> -; CHECK-NEXT: store float [[BIN_EXTRA5]], float* @res, align 4
> -; CHECK-NEXT: ret float [[BIN_EXTRA5]]
> +; CHECK-NEXT: store float [[OP_EXTRA5]], float* @res, align 4
> +; CHECK-NEXT: ret float [[OP_EXTRA5]]
> ;
> ; THRESHOLD-LABEL: @bazz(
> ; THRESHOLD-NEXT: entry:
> @@ -148,11 +148,11 @@ define float @bazz() {
> ; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float>
> [[BIN_RDX2]], [[RDX_SHUF3]]
> ; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <8 x float>
> [[BIN_RDX4]], i32 0
> -; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP4]],
> [[CONV]]
> -; THRESHOLD-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
> [[CONV6]]
> +; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
> +; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
> [[CONV6]]
> ; THRESHOLD-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
> -; THRESHOLD-NEXT: store float [[BIN_EXTRA5]], float* @res, align 4
> -; THRESHOLD-NEXT: ret float [[BIN_EXTRA5]]
> +; THRESHOLD-NEXT: store float [[OP_EXTRA5]], float* @res, align 4
> +; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
> ;
> entry:
> %0 = load i32, i32* @n, align 4
> @@ -327,47 +327,53 @@ entry:
> define float @bar() {
> ; CHECK-LABEL: @bar(
> ; CHECK-NEXT: entry:
> -; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, <2 x float>* bitcast
> ([20 x float]* @arr to <2 x float>*), align 16
> -; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast
> ([20 x float]* @arr1 to <2 x float>*), align 16
> -; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP0]]
> -; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
> -; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
> +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, <4 x float>* bitcast
> ([20 x float]* @arr to <4 x float>*), align 16
> +; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast
> ([20 x float]* @arr1 to <4 x float>*), align 16
> +; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
> +; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
> +; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
> ; CHECK-NEXT: [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]], [[TMP4]]
> -; CHECK-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float [[TMP3]],
> float [[TMP4]]
> -; CHECK-NEXT: [[TMP5:%.*]] = load float, float* getelementptr inbounds
> ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
> -; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
> -; CHECK-NEXT: [[MUL3_1:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
> -; CHECK-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]],
> [[MUL3_1]]
> -; CHECK-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
> [[MAX_0_MUL3]], float [[MUL3_1]]
> -; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds
> ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
> -; CHECK-NEXT: [[TMP8:%.*]] = load float, float* getelementptr inbounds
> ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
> -; CHECK-NEXT: [[MUL3_2:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
> -; CHECK-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]],
> [[MUL3_2]]
> -; CHECK-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
> [[MAX_0_MUL3_1]], float [[MUL3_2]]
> -; CHECK-NEXT: store float [[MAX_0_MUL3_2]], float* @res, align 4
> -; CHECK-NEXT: ret float [[MAX_0_MUL3_2]]
> +; CHECK-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float undef,
> float undef
> +; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
> +; CHECK-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]],
> [[TMP5]]
> +; CHECK-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
> [[MAX_0_MUL3]], float undef
> +; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
> +; CHECK-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]],
> [[TMP6]]
> +; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]],
> <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
> +; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float>
> [[TMP2]], [[RDX_SHUF]]
> +; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1>
> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
> +; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float>
> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32
> undef, i32 undef>
> +; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float>
> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
> +; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1>
> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float>
> [[RDX_SHUF1]]
> +; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float>
> [[RDX_MINMAX_SELECT3]], i32 0
> +; CHECK-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
> [[MAX_0_MUL3_1]], float undef
> +; CHECK-NEXT: store float [[TMP7]], float* @res, align 4
> +; CHECK-NEXT: ret float [[TMP7]]
> ;
> ; THRESHOLD-LABEL: @bar(
> ; THRESHOLD-NEXT: entry:
> -; THRESHOLD-NEXT: [[TMP0:%.*]] = load <2 x float>, <2 x float>*
> bitcast ([20 x float]* @arr to <2 x float>*), align 16
> -; THRESHOLD-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>*
> bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
> -; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]],
> [[TMP0]]
> -; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]],
> i32 0
> -; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]],
> i32 1
> +; THRESHOLD-NEXT: [[TMP0:%.*]] = load <4 x float>, <4 x float>*
> bitcast ([20 x float]* @arr to <4 x float>*), align 16
> +; THRESHOLD-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>*
> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
> +; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]],
> [[TMP0]]
> +; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]],
> i32 0
> +; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]],
> i32 1
> ; THRESHOLD-NEXT: [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]], [[TMP4]]
> -; THRESHOLD-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float
> [[TMP3]], float [[TMP4]]
> -; THRESHOLD-NEXT: [[TMP5:%.*]] = load float, float* getelementptr
> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
> -; THRESHOLD-NEXT: [[TMP6:%.*]] = load float, float* getelementptr
> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
> -; THRESHOLD-NEXT: [[MUL3_1:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
> -; THRESHOLD-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]],
> [[MUL3_1]]
> -; THRESHOLD-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
> [[MAX_0_MUL3]], float [[MUL3_1]]
> -; THRESHOLD-NEXT: [[TMP7:%.*]] = load float, float* getelementptr
> inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
> -; THRESHOLD-NEXT: [[TMP8:%.*]] = load float, float* getelementptr
> inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
> -; THRESHOLD-NEXT: [[MUL3_2:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
> -; THRESHOLD-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float
> [[MAX_0_MUL3_1]], [[MUL3_2]]
> -; THRESHOLD-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
> [[MAX_0_MUL3_1]], float [[MUL3_2]]
> -; THRESHOLD-NEXT: store float [[MAX_0_MUL3_2]], float* @res, align 4
> -; THRESHOLD-NEXT: ret float [[MAX_0_MUL3_2]]
> +; THRESHOLD-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float
> undef, float undef
> +; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]],
> i32 2
> +; THRESHOLD-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]],
> [[TMP5]]
> +; THRESHOLD-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float
> [[MAX_0_MUL3]], float undef
> +; THRESHOLD-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]],
> i32 3
> +; THRESHOLD-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float
> [[MAX_0_MUL3_1]], [[TMP6]]
> +; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float>
> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
> +; THRESHOLD-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float>
> [[TMP2]], [[RDX_SHUF]]
> +; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1>
> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
> +; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float>
> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32
> undef, i32 undef>
> +; THRESHOLD-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float>
> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
> +; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1>
> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float>
> [[RDX_SHUF1]]
> +; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <4 x float>
> [[RDX_MINMAX_SELECT3]], i32 0
> +; THRESHOLD-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float
> [[MAX_0_MUL3_1]], float undef
> +; THRESHOLD-NEXT: store float [[TMP7]], float* @res, align 4
> +; THRESHOLD-NEXT: ret float [[TMP7]]
> ;
> entry:
> %0 = load float, float* getelementptr inbounds ([20 x float], [20 x
> float]* @arr, i64 0, i64 0), align 16
> @@ -512,9 +518,9 @@ define float @f(float* nocapture readonl
> ; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float>
> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float>
> [[BIN_RDX14]], [[RDX_SHUF15]]
> ; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float>
> [[BIN_RDX16]], i32 0
> -; CHECK-NEXT: [[BIN_RDX17:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
> +; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
> ; CHECK-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
> -; CHECK-NEXT: ret float [[BIN_RDX17]]
> +; CHECK-NEXT: ret float [[OP_RDX]]
> ;
> ; THRESHOLD-LABEL: @f(
> ; THRESHOLD-NEXT: entry:
> @@ -635,9 +641,9 @@ define float @f(float* nocapture readonl
> ; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float>
> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float>
> [[BIN_RDX14]], [[RDX_SHUF15]]
> ; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <16 x float>
> [[BIN_RDX16]], i32 0
> -; THRESHOLD-NEXT: [[BIN_RDX17:%.*]] = fadd fast float [[TMP4]],
> [[TMP5]]
> +; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
> ; THRESHOLD-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
> -; THRESHOLD-NEXT: ret float [[BIN_RDX17]]
> +; THRESHOLD-NEXT: ret float [[OP_RDX]]
> ;
> entry:
> %0 = load float, float* %x, align 4
> @@ -865,9 +871,9 @@ define float @f1(float* nocapture readon
> ; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float>
> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]],
> [[RDX_SHUF7]]
> ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]],
> i32 0
> -; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
> +; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
> ; CHECK-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
> -; CHECK-NEXT: ret float [[BIN_EXTRA]]
> +; CHECK-NEXT: ret float [[OP_EXTRA]]
> ;
> ; THRESHOLD-LABEL: @f1(
> ; THRESHOLD-NEXT: entry:
> @@ -948,9 +954,9 @@ define float @f1(float* nocapture readon
> ; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float>
> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> ; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float>
> [[BIN_RDX6]], [[RDX_SHUF7]]
> ; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <32 x float>
> [[BIN_RDX8]], i32 0
> -; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]],
> [[CONV]]
> +; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
> ; THRESHOLD-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
> -; THRESHOLD-NEXT: ret float [[BIN_EXTRA]]
> +; THRESHOLD-NEXT: ret float [[OP_EXTRA]]
> ;
> entry:
> %rem = srem i32 %a, %b
> @@ -1138,14 +1144,14 @@ define float @loadadd31(float* nocapture
> ; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float>
> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float> [[BIN_RDX10]],
> [[RDX_SHUF11]]
> ; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[BIN_RDX12]],
> i32 0
> -; CHECK-NEXT: [[BIN_RDX13:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
> -; CHECK-NEXT: [[RDX_SHUF14:%.*]] = shufflevector <4 x float> [[TMP3]],
> <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
> -; CHECK-NEXT: [[BIN_RDX15:%.*]] = fadd fast <4 x float> [[TMP3]],
> [[RDX_SHUF14]]
> -; CHECK-NEXT: [[RDX_SHUF16:%.*]] = shufflevector <4 x float>
> [[BIN_RDX15]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef>
> -; CHECK-NEXT: [[BIN_RDX17:%.*]] = fadd fast <4 x float> [[BIN_RDX15]],
> [[RDX_SHUF16]]
> -; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float>
> [[BIN_RDX17]], i32 0
> -; CHECK-NEXT: [[BIN_RDX18:%.*]] = fadd fast float [[BIN_RDX13]],
> [[TMP10]]
> -; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[BIN_RDX18]], [[TMP1]]
> +; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
> +; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float> [[TMP3]],
> <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
> +; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]],
> [[RDX_SHUF13]]
> +; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float>
> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef>
> +; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float> [[BIN_RDX14]],
> [[RDX_SHUF15]]
> +; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float>
> [[BIN_RDX16]], i32 0
> +; CHECK-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
> +; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]
> ; CHECK-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
> ; CHECK-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
> ; CHECK-NEXT: ret float [[TMP12]]
> @@ -1234,14 +1240,14 @@ define float @loadadd31(float* nocapture
> ; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float>
> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float>
> [[BIN_RDX10]], [[RDX_SHUF11]]
> ; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <8 x float>
> [[BIN_RDX12]], i32 0
> -; THRESHOLD-NEXT: [[BIN_RDX13:%.*]] = fadd fast float [[TMP8]],
> [[TMP9]]
> -; THRESHOLD-NEXT: [[RDX_SHUF14:%.*]] = shufflevector <4 x float>
> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
> -; THRESHOLD-NEXT: [[BIN_RDX15:%.*]] = fadd fast <4 x float> [[TMP3]],
> [[RDX_SHUF14]]
> -; THRESHOLD-NEXT: [[RDX_SHUF16:%.*]] = shufflevector <4 x float>
> [[BIN_RDX15]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef>
> -; THRESHOLD-NEXT: [[BIN_RDX17:%.*]] = fadd fast <4 x float>
> [[BIN_RDX15]], [[RDX_SHUF16]]
> -; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <4 x float>
> [[BIN_RDX17]], i32 0
> -; THRESHOLD-NEXT: [[BIN_RDX18:%.*]] = fadd fast float [[BIN_RDX13]],
> [[TMP10]]
> -; THRESHOLD-NEXT: [[TMP11:%.*]] = fadd fast float [[BIN_RDX18]],
> [[TMP1]]
> +; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
> +; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float>
> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
> +; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]],
> [[RDX_SHUF13]]
> +; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float>
> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef>
> +; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float>
> [[BIN_RDX14]], [[RDX_SHUF15]]
> +; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <4 x float>
> [[BIN_RDX16]], i32 0
> +; THRESHOLD-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]],
> [[TMP10]]
> +; THRESHOLD-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]],
> [[TMP1]]
> ; THRESHOLD-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
> ; THRESHOLD-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
> ; THRESHOLD-NEXT: ret float [[TMP12]]
> @@ -1369,10 +1375,10 @@ define float @extra_args(float* nocaptur
> ; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]],
> i32 0
> -; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> -; CHECK-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
> [[CONV]]
> +; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> +; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
> [[CONV]]
> ; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
> -; CHECK-NEXT: ret float [[BIN_EXTRA5]]
> +; CHECK-NEXT: ret float [[OP_EXTRA5]]
> ;
> ; THRESHOLD-LABEL: @extra_args(
> ; THRESHOLD-NEXT: entry:
> @@ -1403,10 +1409,10 @@ define float @extra_args(float* nocaptur
> ; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float>
> [[BIN_RDX2]], [[RDX_SHUF3]]
> ; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float>
> [[BIN_RDX4]], i32 0
> -; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> -; THRESHOLD-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
> [[CONV]]
> +; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> +; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
> [[CONV]]
> ; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
> -; THRESHOLD-NEXT: ret float [[BIN_EXTRA5]]
> +; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
> ;
> entry:
> %mul = mul nsw i32 %b, %a
> @@ -1471,12 +1477,12 @@ define float @extra_args_same_several_ti
> ; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]],
> i32 0
> -; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> -; CHECK-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
> 5.000000e+00
> -; CHECK-NEXT: [[BIN_EXTRA6:%.*]] = fadd fast float [[BIN_EXTRA5]],
> 5.000000e+00
> -; CHECK-NEXT: [[BIN_EXTRA7:%.*]] = fadd fast float [[BIN_EXTRA6]],
> [[CONV]]
> +; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> +; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
> 5.000000e+00
> +; CHECK-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]],
> 5.000000e+00
> +; CHECK-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]],
> [[CONV]]
> ; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
> -; CHECK-NEXT: ret float [[BIN_EXTRA7]]
> +; CHECK-NEXT: ret float [[OP_EXTRA7]]
> ;
> ; THRESHOLD-LABEL: @extra_args_same_several_times(
> ; THRESHOLD-NEXT: entry:
> @@ -1509,12 +1515,12 @@ define float @extra_args_same_several_ti
> ; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float>
> [[BIN_RDX2]], [[RDX_SHUF3]]
> ; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float>
> [[BIN_RDX4]], i32 0
> -; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> -; THRESHOLD-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
> 5.000000e+00
> -; THRESHOLD-NEXT: [[BIN_EXTRA6:%.*]] = fadd fast float [[BIN_EXTRA5]],
> 5.000000e+00
> -; THRESHOLD-NEXT: [[BIN_EXTRA7:%.*]] = fadd fast float [[BIN_EXTRA6]],
> [[CONV]]
> +; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> +; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
> 5.000000e+00
> +; THRESHOLD-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]],
> 5.000000e+00
> +; THRESHOLD-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]],
> [[CONV]]
> ; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
> -; THRESHOLD-NEXT: ret float [[BIN_EXTRA7]]
> +; THRESHOLD-NEXT: ret float [[OP_EXTRA7]]
> ;
> entry:
> %mul = mul nsw i32 %b, %a
> @@ -1581,10 +1587,10 @@ define float @extra_args_no_replace(floa
> ; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]],
> i32 0
> -; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> -; CHECK-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
> [[CONV]]
> +; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> +; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
> [[CONV]]
> ; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
> -; CHECK-NEXT: ret float [[BIN_EXTRA5]]
> +; CHECK-NEXT: ret float [[OP_EXTRA5]]
> ;
> ; THRESHOLD-LABEL: @extra_args_no_replace(
> ; THRESHOLD-NEXT: entry:
> @@ -1617,10 +1623,10 @@ define float @extra_args_no_replace(floa
> ; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> ; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float>
> [[BIN_RDX2]], [[RDX_SHUF3]]
> ; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float>
> [[BIN_RDX4]], i32 0
> -; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> -; THRESHOLD-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]],
> [[CONV]]
> +; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
> +; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]],
> [[CONV]]
> ; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
> -; THRESHOLD-NEXT: ret float [[BIN_EXTRA5]]
> +; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
> ;
> entry:
> %mul = mul nsw i32 %b, %a
> @@ -1679,10 +1685,10 @@ define i32 @wobble(i32 %arg, i32 %bar) {
> ; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]],
> <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> ; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]],
> i32 0
> -; CHECK-NEXT: [[BIN_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
> -; CHECK-NEXT: [[BIN_EXTRA3:%.*]] = add nsw i32 [[BIN_EXTRA]], [[TMP9]]
> +; CHECK-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
> +; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]
> ; CHECK-NEXT: [[R5:%.*]] = add nsw i32 [[R4]], undef
> -; CHECK-NEXT: ret i32 [[BIN_EXTRA3]]
> +; CHECK-NEXT: ret i32 [[OP_EXTRA3]]
> ;
> ; THRESHOLD-LABEL: @wobble(
> ; THRESHOLD-NEXT: bb:
> @@ -1707,10 +1713,10 @@ define i32 @wobble(i32 %arg, i32 %bar) {
> ; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32>
> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32
> undef>
> ; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> ; THRESHOLD-NEXT: [[TMP12:%.*]] = extractelement <4 x i32>
> [[BIN_RDX2]], i32 0
> -; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
> -; THRESHOLD-NEXT: [[BIN_EXTRA3:%.*]] = add nsw i32 [[BIN_EXTRA]],
> [[TMP9]]
> +; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
> +; THRESHOLD-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]],
> [[TMP9]]
> ; THRESHOLD-NEXT: [[R5:%.*]] = add nsw i32 [[R4]], undef
> -; THRESHOLD-NEXT: ret i32 [[BIN_EXTRA3]]
> +; THRESHOLD-NEXT: ret i32 [[OP_EXTRA3]]
> ;
> bb:
> %x1 = xor i32 %arg, %bar
>
> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-
> minmax.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/SLPVectorizer/X86/horizontal-minmax.ll?rev=
> 312791&r1=312790&r2=312791&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll
> (original)
> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll Fri
> Sep 8 06:49:36 2017
> @@ -34,79 +34,46 @@ define i32 @maxi8(i32) {
> ; CHECK-NEXT: ret i32 [[TMP23]]
> ;
> ; AVX-LABEL: @maxi8(
> -; AVX-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; AVX-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; AVX-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; AVX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; AVX-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; AVX-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; AVX-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; AVX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; AVX-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; AVX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; AVX-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; AVX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; AVX-NEXT: ret i32 [[TMP23]]
> +; AVX-NEXT: [[TMP2:%.*]] = load <8 x i32>, <8 x i32>* bitcast ([32 x
> i32]* @arr to <8 x i32>*), align 16
> +; AVX: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x
> i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef,
> i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP24:%.*]] = icmp sgt <8 x i32> [[TMP2]], [[RDX_SHUF]]
> +; AVX-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x i32>
> [[TMP2]], <8 x i32> [[RDX_SHUF]]
> +; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]],
> <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP25:%.*]] = icmp sgt <8 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x i32>
> [[BIN_RDX]], <8 x i32> [[RDX_SHUF1]]
> +; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]],
> <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP26:%.*]] = icmp sgt <8 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x i32>
> [[BIN_RDX2]], <8 x i32> [[RDX_SHUF3]]
> +; AVX-NEXT: [[TMP27:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32
> 0
> +; AVX: ret i32 [[TMP27]]
> ;
> ; AVX2-LABEL: @maxi8(
> -; AVX2-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; AVX2-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; AVX2-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; AVX2-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; AVX2-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; AVX2-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; AVX2-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; AVX2-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; AVX2-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; AVX2-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; AVX2-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; AVX2-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; AVX2-NEXT: ret i32 [[TMP23]]
> +; AVX2-NEXT: [[TMP2:%.*]] = load <8 x i32>, <8 x i32>* bitcast ([32 x
> i32]* @arr to <8 x i32>*), align 16
> +; AVX2: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x
> i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef,
> i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP24:%.*]] = icmp sgt <8 x i32> [[TMP2]], [[RDX_SHUF]]
> +; AVX2-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x i32>
> [[TMP2]], <8 x i32> [[RDX_SHUF]]
> +; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]],
> <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP25:%.*]] = icmp sgt <8 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x i32>
> [[BIN_RDX]], <8 x i32> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]],
> <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP26:%.*]] = icmp sgt <8 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x i32>
> [[BIN_RDX2]], <8 x i32> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[TMP27:%.*]] = extractelement <8 x i32> [[BIN_RDX4]],
> i32 0
> +; AVX2: ret i32 [[TMP27]]
> ;
> ; SKX-LABEL: @maxi8(
> -; SKX-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; SKX-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; SKX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; SKX-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; SKX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; SKX-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; SKX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; SKX-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; SKX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; SKX-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; SKX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; SKX-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; SKX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; SKX-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; SKX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; SKX-NEXT: ret i32 [[TMP23]]
> +; SKX-NEXT: [[TMP2:%.*]] = load <8 x i32>, <8 x i32>* bitcast ([32 x
> i32]* @arr to <8 x i32>*), align 16
> +; SKX: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x
> i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef,
> i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP24:%.*]] = icmp sgt <8 x i32> [[TMP2]], [[RDX_SHUF]]
> +; SKX-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x i32>
> [[TMP2]], <8 x i32> [[RDX_SHUF]]
> +; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]],
> <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP25:%.*]] = icmp sgt <8 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; SKX-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x i32>
> [[BIN_RDX]], <8 x i32> [[RDX_SHUF1]]
> +; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]],
> <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP26:%.*]] = icmp sgt <8 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; SKX-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x i32>
> [[BIN_RDX2]], <8 x i32> [[RDX_SHUF3]]
> +; SKX-NEXT: [[TMP27:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32
> 0
> +; SKX: ret i32 [[TMP27]]
> ;
> %2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]*
> @arr, i64 0, i64 0), align 16
> %3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]*
> @arr, i64 0, i64 1), align 4
> @@ -184,151 +151,55 @@ define i32 @maxi16(i32) {
> ; CHECK-NEXT: ret i32 [[TMP47]]
> ;
> ; AVX-LABEL: @maxi16(
> -; AVX-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; AVX-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; AVX-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; AVX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; AVX-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; AVX-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; AVX-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; AVX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; AVX-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; AVX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; AVX-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; AVX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; AVX-NEXT: [[TMP24:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
> -; AVX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
> -; AVX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32
> [[TMP24]]
> -; AVX-NEXT: [[TMP27:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
> -; AVX-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
> -; AVX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32
> [[TMP27]]
> -; AVX-NEXT: [[TMP30:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
> -; AVX-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
> -; AVX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32
> [[TMP30]]
> -; AVX-NEXT: [[TMP33:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
> -; AVX-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
> -; AVX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32
> [[TMP33]]
> -; AVX-NEXT: [[TMP36:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
> -; AVX-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
> -; AVX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32
> [[TMP36]]
> -; AVX-NEXT: [[TMP39:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
> -; AVX-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
> -; AVX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32
> [[TMP39]]
> -; AVX-NEXT: [[TMP42:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
> -; AVX-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
> -; AVX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32
> [[TMP42]]
> -; AVX-NEXT: [[TMP45:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
> -; AVX-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
> -; AVX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32
> [[TMP45]]
> -; AVX-NEXT: ret i32 [[TMP47]]
> +; AVX-NEXT: [[TMP2:%.*]] = load <16 x i32>, <16 x i32>* bitcast ([32 x
> i32]* @arr to <16 x i32>*), align 16
> +; AVX: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP2]], <16
> x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32
> 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP48:%.*]] = icmp sgt <16 x i32> [[TMP2]], [[RDX_SHUF]]
> +; AVX-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x i32>
> [[TMP2]], <16 x i32> [[RDX_SHUF]]
> +; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]],
> <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP49:%.*]] = icmp sgt <16 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x i32>
> [[BIN_RDX]], <16 x i32> [[RDX_SHUF1]]
> +; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]],
> <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP50:%.*]] = icmp sgt <16 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x i32>
> [[BIN_RDX2]], <16 x i32> [[RDX_SHUF3]]
> +; AVX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]],
> <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP51:%.*]] = icmp sgt <16 x i32> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; AVX-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x i32>
> [[BIN_RDX4]], <16 x i32> [[RDX_SHUF5]]
> +; AVX-NEXT: [[TMP52:%.*]] = extractelement <16 x i32> [[BIN_RDX6]],
> i32 0
> +; AVX: ret i32 [[TMP52]]
> ;
> ; AVX2-LABEL: @maxi16(
> -; AVX2-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; AVX2-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; AVX2-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; AVX2-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; AVX2-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; AVX2-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; AVX2-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; AVX2-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; AVX2-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; AVX2-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; AVX2-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; AVX2-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; AVX2-NEXT: [[TMP24:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
> -; AVX2-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
> -; AVX2-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32
> [[TMP24]]
> -; AVX2-NEXT: [[TMP27:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
> -; AVX2-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
> -; AVX2-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32
> [[TMP27]]
> -; AVX2-NEXT: [[TMP30:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
> -; AVX2-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
> -; AVX2-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32
> [[TMP30]]
> -; AVX2-NEXT: [[TMP33:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
> -; AVX2-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
> -; AVX2-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32
> [[TMP33]]
> -; AVX2-NEXT: [[TMP36:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
> -; AVX2-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
> -; AVX2-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32
> [[TMP36]]
> -; AVX2-NEXT: [[TMP39:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
> -; AVX2-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
> -; AVX2-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32
> [[TMP39]]
> -; AVX2-NEXT: [[TMP42:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
> -; AVX2-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
> -; AVX2-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32
> [[TMP42]]
> -; AVX2-NEXT: [[TMP45:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
> -; AVX2-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
> -; AVX2-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32
> [[TMP45]]
> -; AVX2-NEXT: ret i32 [[TMP47]]
> +; AVX2-NEXT: [[TMP2:%.*]] = load <16 x i32>, <16 x i32>* bitcast ([32
> x i32]* @arr to <16 x i32>*), align 16
> +; AVX2: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP2]], <16
> x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32
> 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP48:%.*]] = icmp sgt <16 x i32> [[TMP2]], [[RDX_SHUF]]
> +; AVX2-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x i32>
> [[TMP2]], <16 x i32> [[RDX_SHUF]]
> +; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]],
> <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP49:%.*]] = icmp sgt <16 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x i32>
> [[BIN_RDX]], <16 x i32> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32>
> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP50:%.*]] = icmp sgt <16 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x i32>
> [[BIN_RDX2]], <16 x i32> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32>
> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP51:%.*]] = icmp sgt <16 x i32> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; AVX2-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x i32>
> [[BIN_RDX4]], <16 x i32> [[RDX_SHUF5]]
> +; AVX2-NEXT: [[TMP52:%.*]] = extractelement <16 x i32> [[BIN_RDX6]],
> i32 0
> +; AVX2: ret i32 [[TMP52]]
> ;
> ; SKX-LABEL: @maxi16(
> -; SKX-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; SKX-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; SKX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; SKX-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; SKX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; SKX-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; SKX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; SKX-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; SKX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; SKX-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; SKX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; SKX-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; SKX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; SKX-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; SKX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; SKX-NEXT: [[TMP24:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
> -; SKX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
> -; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32
> [[TMP24]]
> -; SKX-NEXT: [[TMP27:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
> -; SKX-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
> -; SKX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32
> [[TMP27]]
> -; SKX-NEXT: [[TMP30:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
> -; SKX-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
> -; SKX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32
> [[TMP30]]
> -; SKX-NEXT: [[TMP33:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
> -; SKX-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
> -; SKX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32
> [[TMP33]]
> -; SKX-NEXT: [[TMP36:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
> -; SKX-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
> -; SKX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32
> [[TMP36]]
> -; SKX-NEXT: [[TMP39:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
> -; SKX-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
> -; SKX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32
> [[TMP39]]
> -; SKX-NEXT: [[TMP42:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
> -; SKX-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
> -; SKX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32
> [[TMP42]]
> -; SKX-NEXT: [[TMP45:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
> -; SKX-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
> -; SKX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32
> [[TMP45]]
> -; SKX-NEXT: ret i32 [[TMP47]]
> +; SKX-NEXT: [[TMP2:%.*]] = load <16 x i32>, <16 x i32>* bitcast ([32 x
> i32]* @arr to <16 x i32>*), align 16
> +; SKX: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP2]], <16
> x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32
> 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP48:%.*]] = icmp sgt <16 x i32> [[TMP2]], [[RDX_SHUF]]
> +; SKX-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x i32>
> [[TMP2]], <16 x i32> [[RDX_SHUF]]
> +; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]],
> <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP49:%.*]] = icmp sgt <16 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; SKX-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x i32>
> [[BIN_RDX]], <16 x i32> [[RDX_SHUF1]]
> +; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]],
> <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP50:%.*]] = icmp sgt <16 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; SKX-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x i32>
> [[BIN_RDX2]], <16 x i32> [[RDX_SHUF3]]
> +; SKX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]],
> <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP51:%.*]] = icmp sgt <16 x i32> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; SKX-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x i32>
> [[BIN_RDX4]], <16 x i32> [[RDX_SHUF5]]
> +; SKX-NEXT: [[TMP52:%.*]] = extractelement <16 x i32> [[BIN_RDX6]],
> i32 0
> +; SKX: ret i32 [[TMP52]]
> ;
> %2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]*
> @arr, i64 0, i64 0), align 16
> %3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]*
> @arr, i64 0, i64 1), align 4
> @@ -381,392 +252,84 @@ define i32 @maxi16(i32) {
>
> define i32 @maxi32(i32) {
> ; CHECK-LABEL: @maxi32(
> -; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; CHECK-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; CHECK-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; CHECK-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; CHECK-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; CHECK-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; CHECK-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; CHECK-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; CHECK-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; CHECK-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; CHECK-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; CHECK-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; CHECK-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; CHECK-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; CHECK-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; CHECK-NEXT: [[TMP24:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
> -; CHECK-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
> -; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32
> [[TMP24]]
> -; CHECK-NEXT: [[TMP27:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
> -; CHECK-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
> -; CHECK-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32
> [[TMP27]]
> -; CHECK-NEXT: [[TMP30:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
> -; CHECK-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
> -; CHECK-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32
> [[TMP30]]
> -; CHECK-NEXT: [[TMP33:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
> -; CHECK-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
> -; CHECK-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32
> [[TMP33]]
> -; CHECK-NEXT: [[TMP36:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
> -; CHECK-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
> -; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32
> [[TMP36]]
> -; CHECK-NEXT: [[TMP39:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
> -; CHECK-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
> -; CHECK-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32
> [[TMP39]]
> -; CHECK-NEXT: [[TMP42:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
> -; CHECK-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
> -; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32
> [[TMP42]]
> -; CHECK-NEXT: [[TMP45:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
> -; CHECK-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
> -; CHECK-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32
> [[TMP45]]
> -; CHECK-NEXT: [[TMP48:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 16), align 16
> -; CHECK-NEXT: [[TMP49:%.*]] = icmp sgt i32 [[TMP47]], [[TMP48]]
> -; CHECK-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], i32 [[TMP47]], i32
> [[TMP48]]
> -; CHECK-NEXT: [[TMP51:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 17), align 4
> -; CHECK-NEXT: [[TMP52:%.*]] = icmp sgt i32 [[TMP50]], [[TMP51]]
> -; CHECK-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], i32 [[TMP50]], i32
> [[TMP51]]
> -; CHECK-NEXT: [[TMP54:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 18), align 8
> -; CHECK-NEXT: [[TMP55:%.*]] = icmp sgt i32 [[TMP53]], [[TMP54]]
> -; CHECK-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[TMP53]], i32
> [[TMP54]]
> -; CHECK-NEXT: [[TMP57:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 19), align 4
> -; CHECK-NEXT: [[TMP58:%.*]] = icmp sgt i32 [[TMP56]], [[TMP57]]
> -; CHECK-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[TMP56]], i32
> [[TMP57]]
> -; CHECK-NEXT: [[TMP60:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 20), align 16
> -; CHECK-NEXT: [[TMP61:%.*]] = icmp sgt i32 [[TMP59]], [[TMP60]]
> -; CHECK-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[TMP59]], i32
> [[TMP60]]
> -; CHECK-NEXT: [[TMP63:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 21), align 4
> -; CHECK-NEXT: [[TMP64:%.*]] = icmp sgt i32 [[TMP62]], [[TMP63]]
> -; CHECK-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], i32 [[TMP62]], i32
> [[TMP63]]
> -; CHECK-NEXT: [[TMP66:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 22), align 8
> -; CHECK-NEXT: [[TMP67:%.*]] = icmp sgt i32 [[TMP65]], [[TMP66]]
> -; CHECK-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], i32 [[TMP65]], i32
> [[TMP66]]
> -; CHECK-NEXT: [[TMP69:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 23), align 4
> -; CHECK-NEXT: [[TMP70:%.*]] = icmp sgt i32 [[TMP68]], [[TMP69]]
> -; CHECK-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP68]], i32
> [[TMP69]]
> -; CHECK-NEXT: [[TMP72:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 24), align 16
> -; CHECK-NEXT: [[TMP73:%.*]] = icmp sgt i32 [[TMP71]], [[TMP72]]
> -; CHECK-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], i32 [[TMP71]], i32
> [[TMP72]]
> -; CHECK-NEXT: [[TMP75:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 25), align 4
> -; CHECK-NEXT: [[TMP76:%.*]] = icmp sgt i32 [[TMP74]], [[TMP75]]
> -; CHECK-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], i32 [[TMP74]], i32
> [[TMP75]]
> -; CHECK-NEXT: [[TMP78:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 26), align 8
> -; CHECK-NEXT: [[TMP79:%.*]] = icmp sgt i32 [[TMP77]], [[TMP78]]
> -; CHECK-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], i32 [[TMP77]], i32
> [[TMP78]]
> -; CHECK-NEXT: [[TMP81:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 27), align 4
> -; CHECK-NEXT: [[TMP82:%.*]] = icmp sgt i32 [[TMP80]], [[TMP81]]
> -; CHECK-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], i32 [[TMP80]], i32
> [[TMP81]]
> -; CHECK-NEXT: [[TMP84:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 28), align 16
> -; CHECK-NEXT: [[TMP85:%.*]] = icmp sgt i32 [[TMP83]], [[TMP84]]
> -; CHECK-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], i32 [[TMP83]], i32
> [[TMP84]]
> -; CHECK-NEXT: [[TMP87:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 29), align 4
> -; CHECK-NEXT: [[TMP88:%.*]] = icmp sgt i32 [[TMP86]], [[TMP87]]
> -; CHECK-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[TMP86]], i32
> [[TMP87]]
> -; CHECK-NEXT: [[TMP90:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 30), align 8
> -; CHECK-NEXT: [[TMP91:%.*]] = icmp sgt i32 [[TMP89]], [[TMP90]]
> -; CHECK-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[TMP89]], i32
> [[TMP90]]
> -; CHECK-NEXT: [[TMP93:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 31), align 4
> -; CHECK-NEXT: [[TMP94:%.*]] = icmp sgt i32 [[TMP92]], [[TMP93]]
> -; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP92]], i32
> [[TMP93]]
> -; CHECK-NEXT: ret i32 [[TMP95]]
> +; CHECK-NEXT: [[TMP2:%.*]] = load <32 x i32>, <32 x i32>* bitcast ([32
> x i32]* @arr to <32 x i32>*), align 16
> +; CHECK: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]],
> <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32
> 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30,
> i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef>
> +; CHECK-NEXT: [[TMP96:%.*]] = icmp sgt <32 x i32> [[TMP2]],
> [[RDX_SHUF]]
> +; CHECK-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x i32>
> [[TMP2]], <32 x i32> [[RDX_SHUF]]
> +; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32>
> [[BIN_RDX]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11,
> i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; CHECK-NEXT: [[TMP97:%.*]] = icmp sgt <32 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; CHECK-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x
> i32> [[BIN_RDX]], <32 x i32> [[RDX_SHUF1]]
> +; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32>
> [[BIN_RDX2]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef>
> +; CHECK-NEXT: [[TMP98:%.*]] = icmp sgt <32 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; CHECK-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x
> i32> [[BIN_RDX2]], <32 x i32> [[RDX_SHUF3]]
> +; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32>
> [[BIN_RDX4]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; CHECK-NEXT: [[TMP99:%.*]] = icmp sgt <32 x i32> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; CHECK-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x
> i32> [[BIN_RDX4]], <32 x i32> [[RDX_SHUF5]]
> +; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32>
> [[BIN_RDX6]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; CHECK-NEXT: [[TMP100:%.*]] = icmp sgt <32 x i32> [[BIN_RDX6]],
> [[RDX_SHUF7]]
> +; CHECK-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x
> i32> [[BIN_RDX6]], <32 x i32> [[RDX_SHUF7]]
> +; CHECK-NEXT: [[TMP101:%.*]] = extractelement <32 x i32> [[BIN_RDX8]],
> i32 0
> +; CHECK: ret i32 [[TMP101]]
> ;
> ; AVX-LABEL: @maxi32(
> -; AVX-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; AVX-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; AVX-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; AVX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; AVX-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; AVX-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; AVX-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; AVX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; AVX-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; AVX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; AVX-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; AVX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; AVX-NEXT: [[TMP24:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
> -; AVX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
> -; AVX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32
> [[TMP24]]
> -; AVX-NEXT: [[TMP27:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
> -; AVX-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
> -; AVX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32
> [[TMP27]]
> -; AVX-NEXT: [[TMP30:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
> -; AVX-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
> -; AVX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32
> [[TMP30]]
> -; AVX-NEXT: [[TMP33:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
> -; AVX-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
> -; AVX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32
> [[TMP33]]
> -; AVX-NEXT: [[TMP36:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
> -; AVX-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
> -; AVX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32
> [[TMP36]]
> -; AVX-NEXT: [[TMP39:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
> -; AVX-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
> -; AVX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32
> [[TMP39]]
> -; AVX-NEXT: [[TMP42:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
> -; AVX-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
> -; AVX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32
> [[TMP42]]
> -; AVX-NEXT: [[TMP45:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
> -; AVX-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
> -; AVX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32
> [[TMP45]]
> -; AVX-NEXT: [[TMP48:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 16), align 16
> -; AVX-NEXT: [[TMP49:%.*]] = icmp sgt i32 [[TMP47]], [[TMP48]]
> -; AVX-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], i32 [[TMP47]], i32
> [[TMP48]]
> -; AVX-NEXT: [[TMP51:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 17), align 4
> -; AVX-NEXT: [[TMP52:%.*]] = icmp sgt i32 [[TMP50]], [[TMP51]]
> -; AVX-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], i32 [[TMP50]], i32
> [[TMP51]]
> -; AVX-NEXT: [[TMP54:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 18), align 8
> -; AVX-NEXT: [[TMP55:%.*]] = icmp sgt i32 [[TMP53]], [[TMP54]]
> -; AVX-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[TMP53]], i32
> [[TMP54]]
> -; AVX-NEXT: [[TMP57:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 19), align 4
> -; AVX-NEXT: [[TMP58:%.*]] = icmp sgt i32 [[TMP56]], [[TMP57]]
> -; AVX-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[TMP56]], i32
> [[TMP57]]
> -; AVX-NEXT: [[TMP60:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 20), align 16
> -; AVX-NEXT: [[TMP61:%.*]] = icmp sgt i32 [[TMP59]], [[TMP60]]
> -; AVX-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[TMP59]], i32
> [[TMP60]]
> -; AVX-NEXT: [[TMP63:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 21), align 4
> -; AVX-NEXT: [[TMP64:%.*]] = icmp sgt i32 [[TMP62]], [[TMP63]]
> -; AVX-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], i32 [[TMP62]], i32
> [[TMP63]]
> -; AVX-NEXT: [[TMP66:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 22), align 8
> -; AVX-NEXT: [[TMP67:%.*]] = icmp sgt i32 [[TMP65]], [[TMP66]]
> -; AVX-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], i32 [[TMP65]], i32
> [[TMP66]]
> -; AVX-NEXT: [[TMP69:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 23), align 4
> -; AVX-NEXT: [[TMP70:%.*]] = icmp sgt i32 [[TMP68]], [[TMP69]]
> -; AVX-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP68]], i32
> [[TMP69]]
> -; AVX-NEXT: [[TMP72:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 24), align 16
> -; AVX-NEXT: [[TMP73:%.*]] = icmp sgt i32 [[TMP71]], [[TMP72]]
> -; AVX-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], i32 [[TMP71]], i32
> [[TMP72]]
> -; AVX-NEXT: [[TMP75:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 25), align 4
> -; AVX-NEXT: [[TMP76:%.*]] = icmp sgt i32 [[TMP74]], [[TMP75]]
> -; AVX-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], i32 [[TMP74]], i32
> [[TMP75]]
> -; AVX-NEXT: [[TMP78:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 26), align 8
> -; AVX-NEXT: [[TMP79:%.*]] = icmp sgt i32 [[TMP77]], [[TMP78]]
> -; AVX-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], i32 [[TMP77]], i32
> [[TMP78]]
> -; AVX-NEXT: [[TMP81:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 27), align 4
> -; AVX-NEXT: [[TMP82:%.*]] = icmp sgt i32 [[TMP80]], [[TMP81]]
> -; AVX-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], i32 [[TMP80]], i32
> [[TMP81]]
> -; AVX-NEXT: [[TMP84:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 28), align 16
> -; AVX-NEXT: [[TMP85:%.*]] = icmp sgt i32 [[TMP83]], [[TMP84]]
> -; AVX-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], i32 [[TMP83]], i32
> [[TMP84]]
> -; AVX-NEXT: [[TMP87:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 29), align 4
> -; AVX-NEXT: [[TMP88:%.*]] = icmp sgt i32 [[TMP86]], [[TMP87]]
> -; AVX-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[TMP86]], i32
> [[TMP87]]
> -; AVX-NEXT: [[TMP90:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 30), align 8
> -; AVX-NEXT: [[TMP91:%.*]] = icmp sgt i32 [[TMP89]], [[TMP90]]
> -; AVX-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[TMP89]], i32
> [[TMP90]]
> -; AVX-NEXT: [[TMP93:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 31), align 4
> -; AVX-NEXT: [[TMP94:%.*]] = icmp sgt i32 [[TMP92]], [[TMP93]]
> -; AVX-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP92]], i32
> [[TMP93]]
> -; AVX-NEXT: ret i32 [[TMP95]]
> +; AVX-NEXT: [[TMP2:%.*]] = load <32 x i32>, <32 x i32>* bitcast ([32 x
> i32]* @arr to <32 x i32>*), align 16
> +; AVX: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32
> x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21,
> i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32
> 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP96:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]
> +; AVX-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x i32>
> [[TMP2]], <32 x i32> [[RDX_SHUF]]
> +; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]],
> <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13,
> i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP97:%.*]] = icmp sgt <32 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x i32>
> [[BIN_RDX]], <32 x i32> [[RDX_SHUF1]]
> +; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32> [[BIN_RDX2]],
> <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP98:%.*]] = icmp sgt <32 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x i32>
> [[BIN_RDX2]], <32 x i32> [[RDX_SHUF3]]
> +; AVX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32> [[BIN_RDX4]],
> <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef>
> +; AVX-NEXT: [[TMP99:%.*]] = icmp sgt <32 x i32> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; AVX-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x i32>
> [[BIN_RDX4]], <32 x i32> [[RDX_SHUF5]]
> +; AVX-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[BIN_RDX6]],
> <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef>
> +; AVX-NEXT: [[TMP100:%.*]] = icmp sgt <32 x i32> [[BIN_RDX6]],
> [[RDX_SHUF7]]
> +; AVX-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x i32>
> [[BIN_RDX6]], <32 x i32> [[RDX_SHUF7]]
> +; AVX-NEXT: [[TMP101:%.*]] = extractelement <32 x i32> [[BIN_RDX8]],
> i32 0
> +; AVX: ret i32 [[TMP101]]
> ;
> ; AVX2-LABEL: @maxi32(
> -; AVX2-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; AVX2-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; AVX2-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; AVX2-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; AVX2-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; AVX2-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; AVX2-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; AVX2-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; AVX2-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; AVX2-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; AVX2-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; AVX2-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; AVX2-NEXT: [[TMP24:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
> -; AVX2-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
> -; AVX2-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32
> [[TMP24]]
> -; AVX2-NEXT: [[TMP27:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
> -; AVX2-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
> -; AVX2-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32
> [[TMP27]]
> -; AVX2-NEXT: [[TMP30:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
> -; AVX2-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
> -; AVX2-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32
> [[TMP30]]
> -; AVX2-NEXT: [[TMP33:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
> -; AVX2-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
> -; AVX2-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32
> [[TMP33]]
> -; AVX2-NEXT: [[TMP36:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
> -; AVX2-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
> -; AVX2-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32
> [[TMP36]]
> -; AVX2-NEXT: [[TMP39:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
> -; AVX2-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
> -; AVX2-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32
> [[TMP39]]
> -; AVX2-NEXT: [[TMP42:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
> -; AVX2-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
> -; AVX2-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32
> [[TMP42]]
> -; AVX2-NEXT: [[TMP45:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
> -; AVX2-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
> -; AVX2-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32
> [[TMP45]]
> -; AVX2-NEXT: [[TMP48:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 16), align 16
> -; AVX2-NEXT: [[TMP49:%.*]] = icmp sgt i32 [[TMP47]], [[TMP48]]
> -; AVX2-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], i32 [[TMP47]], i32
> [[TMP48]]
> -; AVX2-NEXT: [[TMP51:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 17), align 4
> -; AVX2-NEXT: [[TMP52:%.*]] = icmp sgt i32 [[TMP50]], [[TMP51]]
> -; AVX2-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], i32 [[TMP50]], i32
> [[TMP51]]
> -; AVX2-NEXT: [[TMP54:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 18), align 8
> -; AVX2-NEXT: [[TMP55:%.*]] = icmp sgt i32 [[TMP53]], [[TMP54]]
> -; AVX2-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[TMP53]], i32
> [[TMP54]]
> -; AVX2-NEXT: [[TMP57:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 19), align 4
> -; AVX2-NEXT: [[TMP58:%.*]] = icmp sgt i32 [[TMP56]], [[TMP57]]
> -; AVX2-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[TMP56]], i32
> [[TMP57]]
> -; AVX2-NEXT: [[TMP60:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 20), align 16
> -; AVX2-NEXT: [[TMP61:%.*]] = icmp sgt i32 [[TMP59]], [[TMP60]]
> -; AVX2-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[TMP59]], i32
> [[TMP60]]
> -; AVX2-NEXT: [[TMP63:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 21), align 4
> -; AVX2-NEXT: [[TMP64:%.*]] = icmp sgt i32 [[TMP62]], [[TMP63]]
> -; AVX2-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], i32 [[TMP62]], i32
> [[TMP63]]
> -; AVX2-NEXT: [[TMP66:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 22), align 8
> -; AVX2-NEXT: [[TMP67:%.*]] = icmp sgt i32 [[TMP65]], [[TMP66]]
> -; AVX2-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], i32 [[TMP65]], i32
> [[TMP66]]
> -; AVX2-NEXT: [[TMP69:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 23), align 4
> -; AVX2-NEXT: [[TMP70:%.*]] = icmp sgt i32 [[TMP68]], [[TMP69]]
> -; AVX2-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP68]], i32
> [[TMP69]]
> -; AVX2-NEXT: [[TMP72:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 24), align 16
> -; AVX2-NEXT: [[TMP73:%.*]] = icmp sgt i32 [[TMP71]], [[TMP72]]
> -; AVX2-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], i32 [[TMP71]], i32
> [[TMP72]]
> -; AVX2-NEXT: [[TMP75:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 25), align 4
> -; AVX2-NEXT: [[TMP76:%.*]] = icmp sgt i32 [[TMP74]], [[TMP75]]
> -; AVX2-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], i32 [[TMP74]], i32
> [[TMP75]]
> -; AVX2-NEXT: [[TMP78:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 26), align 8
> -; AVX2-NEXT: [[TMP79:%.*]] = icmp sgt i32 [[TMP77]], [[TMP78]]
> -; AVX2-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], i32 [[TMP77]], i32
> [[TMP78]]
> -; AVX2-NEXT: [[TMP81:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 27), align 4
> -; AVX2-NEXT: [[TMP82:%.*]] = icmp sgt i32 [[TMP80]], [[TMP81]]
> -; AVX2-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], i32 [[TMP80]], i32
> [[TMP81]]
> -; AVX2-NEXT: [[TMP84:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 28), align 16
> -; AVX2-NEXT: [[TMP85:%.*]] = icmp sgt i32 [[TMP83]], [[TMP84]]
> -; AVX2-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], i32 [[TMP83]], i32
> [[TMP84]]
> -; AVX2-NEXT: [[TMP87:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 29), align 4
> -; AVX2-NEXT: [[TMP88:%.*]] = icmp sgt i32 [[TMP86]], [[TMP87]]
> -; AVX2-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[TMP86]], i32
> [[TMP87]]
> -; AVX2-NEXT: [[TMP90:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 30), align 8
> -; AVX2-NEXT: [[TMP91:%.*]] = icmp sgt i32 [[TMP89]], [[TMP90]]
> -; AVX2-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[TMP89]], i32
> [[TMP90]]
> -; AVX2-NEXT: [[TMP93:%.*]] = load i32, i32* getelementptr inbounds
> ([32 x i32], [32 x i32]* @arr, i64 0, i64 31), align 4
> -; AVX2-NEXT: [[TMP94:%.*]] = icmp sgt i32 [[TMP92]], [[TMP93]]
> -; AVX2-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP92]], i32
> [[TMP93]]
> -; AVX2-NEXT: ret i32 [[TMP95]]
> +; AVX2-NEXT: [[TMP2:%.*]] = load <32 x i32>, <32 x i32>* bitcast ([32
> x i32]* @arr to <32 x i32>*), align 16
> +; AVX2: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32
> x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21,
> i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32
> 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP96:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]
> +; AVX2-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x i32>
> [[TMP2]], <32 x i32> [[RDX_SHUF]]
> +; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]],
> <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13,
> i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP97:%.*]] = icmp sgt <32 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x i32>
> [[BIN_RDX]], <32 x i32> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32>
> [[BIN_RDX2]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef>
> +; AVX2-NEXT: [[TMP98:%.*]] = icmp sgt <32 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x i32>
> [[BIN_RDX2]], <32 x i32> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32>
> [[BIN_RDX4]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP99:%.*]] = icmp sgt <32 x i32> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; AVX2-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x i32>
> [[BIN_RDX4]], <32 x i32> [[RDX_SHUF5]]
> +; AVX2-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32>
> [[BIN_RDX6]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP100:%.*]] = icmp sgt <32 x i32> [[BIN_RDX6]],
> [[RDX_SHUF7]]
> +; AVX2-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x
> i32> [[BIN_RDX6]], <32 x i32> [[RDX_SHUF7]]
> +; AVX2-NEXT: [[TMP101:%.*]] = extractelement <32 x i32> [[BIN_RDX8]],
> i32 0
> +; AVX2: ret i32 [[TMP101]]
> ;
> ; SKX-LABEL: @maxi32(
> -; SKX-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
> -; SKX-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
> -; SKX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
> -; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32
> [[TMP3]]
> -; SKX-NEXT: [[TMP6:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
> -; SKX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]
> -; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32
> [[TMP6]]
> -; SKX-NEXT: [[TMP9:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
> -; SKX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
> -; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32
> [[TMP9]]
> -; SKX-NEXT: [[TMP12:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
> -; SKX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
> -; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32
> [[TMP12]]
> -; SKX-NEXT: [[TMP15:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
> -; SKX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
> -; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32
> [[TMP15]]
> -; SKX-NEXT: [[TMP18:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
> -; SKX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
> -; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32
> [[TMP18]]
> -; SKX-NEXT: [[TMP21:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
> -; SKX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
> -; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32
> [[TMP21]]
> -; SKX-NEXT: [[TMP24:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
> -; SKX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
> -; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32
> [[TMP24]]
> -; SKX-NEXT: [[TMP27:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
> -; SKX-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
> -; SKX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32
> [[TMP27]]
> -; SKX-NEXT: [[TMP30:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
> -; SKX-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
> -; SKX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32
> [[TMP30]]
> -; SKX-NEXT: [[TMP33:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
> -; SKX-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
> -; SKX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32
> [[TMP33]]
> -; SKX-NEXT: [[TMP36:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
> -; SKX-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
> -; SKX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32
> [[TMP36]]
> -; SKX-NEXT: [[TMP39:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
> -; SKX-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
> -; SKX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32
> [[TMP39]]
> -; SKX-NEXT: [[TMP42:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
> -; SKX-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
> -; SKX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32
> [[TMP42]]
> -; SKX-NEXT: [[TMP45:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
> -; SKX-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
> -; SKX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32
> [[TMP45]]
> -; SKX-NEXT: [[TMP48:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 16), align 16
> -; SKX-NEXT: [[TMP49:%.*]] = icmp sgt i32 [[TMP47]], [[TMP48]]
> -; SKX-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], i32 [[TMP47]], i32
> [[TMP48]]
> -; SKX-NEXT: [[TMP51:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 17), align 4
> -; SKX-NEXT: [[TMP52:%.*]] = icmp sgt i32 [[TMP50]], [[TMP51]]
> -; SKX-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], i32 [[TMP50]], i32
> [[TMP51]]
> -; SKX-NEXT: [[TMP54:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 18), align 8
> -; SKX-NEXT: [[TMP55:%.*]] = icmp sgt i32 [[TMP53]], [[TMP54]]
> -; SKX-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[TMP53]], i32
> [[TMP54]]
> -; SKX-NEXT: [[TMP57:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 19), align 4
> -; SKX-NEXT: [[TMP58:%.*]] = icmp sgt i32 [[TMP56]], [[TMP57]]
> -; SKX-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[TMP56]], i32
> [[TMP57]]
> -; SKX-NEXT: [[TMP60:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 20), align 16
> -; SKX-NEXT: [[TMP61:%.*]] = icmp sgt i32 [[TMP59]], [[TMP60]]
> -; SKX-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[TMP59]], i32
> [[TMP60]]
> -; SKX-NEXT: [[TMP63:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 21), align 4
> -; SKX-NEXT: [[TMP64:%.*]] = icmp sgt i32 [[TMP62]], [[TMP63]]
> -; SKX-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], i32 [[TMP62]], i32
> [[TMP63]]
> -; SKX-NEXT: [[TMP66:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 22), align 8
> -; SKX-NEXT: [[TMP67:%.*]] = icmp sgt i32 [[TMP65]], [[TMP66]]
> -; SKX-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], i32 [[TMP65]], i32
> [[TMP66]]
> -; SKX-NEXT: [[TMP69:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 23), align 4
> -; SKX-NEXT: [[TMP70:%.*]] = icmp sgt i32 [[TMP68]], [[TMP69]]
> -; SKX-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP68]], i32
> [[TMP69]]
> -; SKX-NEXT: [[TMP72:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 24), align 16
> -; SKX-NEXT: [[TMP73:%.*]] = icmp sgt i32 [[TMP71]], [[TMP72]]
> -; SKX-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], i32 [[TMP71]], i32
> [[TMP72]]
> -; SKX-NEXT: [[TMP75:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 25), align 4
> -; SKX-NEXT: [[TMP76:%.*]] = icmp sgt i32 [[TMP74]], [[TMP75]]
> -; SKX-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], i32 [[TMP74]], i32
> [[TMP75]]
> -; SKX-NEXT: [[TMP78:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 26), align 8
> -; SKX-NEXT: [[TMP79:%.*]] = icmp sgt i32 [[TMP77]], [[TMP78]]
> -; SKX-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], i32 [[TMP77]], i32
> [[TMP78]]
> -; SKX-NEXT: [[TMP81:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 27), align 4
> -; SKX-NEXT: [[TMP82:%.*]] = icmp sgt i32 [[TMP80]], [[TMP81]]
> -; SKX-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], i32 [[TMP80]], i32
> [[TMP81]]
> -; SKX-NEXT: [[TMP84:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 28), align 16
> -; SKX-NEXT: [[TMP85:%.*]] = icmp sgt i32 [[TMP83]], [[TMP84]]
> -; SKX-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], i32 [[TMP83]], i32
> [[TMP84]]
> -; SKX-NEXT: [[TMP87:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 29), align 4
> -; SKX-NEXT: [[TMP88:%.*]] = icmp sgt i32 [[TMP86]], [[TMP87]]
> -; SKX-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[TMP86]], i32
> [[TMP87]]
> -; SKX-NEXT: [[TMP90:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 30), align 8
> -; SKX-NEXT: [[TMP91:%.*]] = icmp sgt i32 [[TMP89]], [[TMP90]]
> -; SKX-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[TMP89]], i32
> [[TMP90]]
> -; SKX-NEXT: [[TMP93:%.*]] = load i32, i32* getelementptr inbounds ([32
> x i32], [32 x i32]* @arr, i64 0, i64 31), align 4
> -; SKX-NEXT: [[TMP94:%.*]] = icmp sgt i32 [[TMP92]], [[TMP93]]
> -; SKX-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP92]], i32
> [[TMP93]]
> -; SKX-NEXT: ret i32 [[TMP95]]
> +; SKX-NEXT: [[TMP2:%.*]] = load <32 x i32>, <32 x i32>* bitcast ([32 x
> i32]* @arr to <32 x i32>*), align 16
> +; SKX: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32
> x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21,
> i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32
> 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP96:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]
> +; SKX-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x i32>
> [[TMP2]], <32 x i32> [[RDX_SHUF]]
> +; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]],
> <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13,
> i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP97:%.*]] = icmp sgt <32 x i32> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; SKX-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x i32>
> [[BIN_RDX]], <32 x i32> [[RDX_SHUF1]]
> +; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32> [[BIN_RDX2]],
> <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP98:%.*]] = icmp sgt <32 x i32> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; SKX-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x i32>
> [[BIN_RDX2]], <32 x i32> [[RDX_SHUF3]]
> +; SKX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32> [[BIN_RDX4]],
> <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef>
> +; SKX-NEXT: [[TMP99:%.*]] = icmp sgt <32 x i32> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; SKX-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x i32>
> [[BIN_RDX4]], <32 x i32> [[RDX_SHUF5]]
> +; SKX-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[BIN_RDX6]],
> <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef>
> +; SKX-NEXT: [[TMP100:%.*]] = icmp sgt <32 x i32> [[BIN_RDX6]],
> [[RDX_SHUF7]]
> +; SKX-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x i32>
> [[BIN_RDX6]], <32 x i32> [[RDX_SHUF7]]
> +; SKX-NEXT: [[TMP101:%.*]] = extractelement <32 x i32> [[BIN_RDX8]],
> i32 0
> +; SKX: ret i32 [[TMP101]]
> ;
> %2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]*
> @arr, i64 0, i64 0), align 16
> %3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]*
> @arr, i64 0, i64 1), align 4
> @@ -892,79 +455,46 @@ define float @maxf8(float) {
> ; CHECK-NEXT: ret float [[TMP23]]
> ;
> ; AVX-LABEL: @maxf8(
> -; AVX-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
> -; AVX-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
> -; AVX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]
> -; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float
> [[TMP3]]
> -; AVX-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
> -; AVX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]
> -; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float
> [[TMP6]]
> -; AVX-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4
> -; AVX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]
> -; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float
> [[TMP9]]
> -; AVX-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16
> -; AVX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]
> -; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]],
> float [[TMP12]]
> -; AVX-NEXT: [[TMP15:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
> -; AVX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
> -; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]],
> float [[TMP15]]
> -; AVX-NEXT: [[TMP18:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
> -; AVX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
> -; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]],
> float [[TMP18]]
> -; AVX-NEXT: [[TMP21:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
> -; AVX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
> -; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]],
> float [[TMP21]]
> -; AVX-NEXT: ret float [[TMP23]]
> +; AVX-NEXT: [[TMP2:%.*]] = load <8 x float>, <8 x float>* bitcast ([32
> x float]* @arr1 to <8 x float>*), align 16
> +; AVX: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP2]], <8
> x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP24:%.*]] = fcmp fast ogt <8 x float> [[TMP2]],
> [[RDX_SHUF]]
> +; AVX-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x float>
> [[TMP2]], <8 x float> [[RDX_SHUF]]
> +; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]],
> <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP25:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x float>
> [[BIN_RDX]], <8 x float> [[RDX_SHUF1]]
> +; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP26:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x float>
> [[BIN_RDX2]], <8 x float> [[RDX_SHUF3]]
> +; AVX-NEXT: [[TMP27:%.*]] = extractelement <8 x float> [[BIN_RDX4]],
> i32 0
> +; AVX: ret float [[TMP27]]
> ;
> ; AVX2-LABEL: @maxf8(
> -; AVX2-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
> -; AVX2-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
> -; AVX2-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]
> -; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float
> [[TMP3]]
> -; AVX2-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
> -; AVX2-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]
> -; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float
> [[TMP6]]
> -; AVX2-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4
> -; AVX2-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]
> -; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]],
> float [[TMP9]]
> -; AVX2-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16
> -; AVX2-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]
> -; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]],
> float [[TMP12]]
> -; AVX2-NEXT: [[TMP15:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
> -; AVX2-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
> -; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]],
> float [[TMP15]]
> -; AVX2-NEXT: [[TMP18:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
> -; AVX2-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
> -; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]],
> float [[TMP18]]
> -; AVX2-NEXT: [[TMP21:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
> -; AVX2-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
> -; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]],
> float [[TMP21]]
> -; AVX2-NEXT: ret float [[TMP23]]
> +; AVX2-NEXT: [[TMP2:%.*]] = load <8 x float>, <8 x float>* bitcast
> ([32 x float]* @arr1 to <8 x float>*), align 16
> +; AVX2: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP2]], <8
> x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP24:%.*]] = fcmp fast ogt <8 x float> [[TMP2]],
> [[RDX_SHUF]]
> +; AVX2-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x float>
> [[TMP2]], <8 x float> [[RDX_SHUF]]
> +; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float>
> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP25:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x float>
> [[BIN_RDX]], <8 x float> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP26:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x float>
> [[BIN_RDX2]], <8 x float> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[TMP27:%.*]] = extractelement <8 x float> [[BIN_RDX4]],
> i32 0
> +; AVX2: ret float [[TMP27]]
> ;
> ; SKX-LABEL: @maxf8(
> -; SKX-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
> -; SKX-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
> -; SKX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]
> -; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float
> [[TMP3]]
> -; SKX-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
> -; SKX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]
> -; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float
> [[TMP6]]
> -; SKX-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4
> -; SKX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]
> -; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float
> [[TMP9]]
> -; SKX-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16
> -; SKX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]
> -; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]],
> float [[TMP12]]
> -; SKX-NEXT: [[TMP15:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
> -; SKX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
> -; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]],
> float [[TMP15]]
> -; SKX-NEXT: [[TMP18:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
> -; SKX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
> -; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]],
> float [[TMP18]]
> -; SKX-NEXT: [[TMP21:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
> -; SKX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
> -; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]],
> float [[TMP21]]
> -; SKX-NEXT: ret float [[TMP23]]
> +; SKX-NEXT: [[TMP2:%.*]] = load <8 x float>, <8 x float>* bitcast ([32
> x float]* @arr1 to <8 x float>*), align 16
> +; SKX: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP2]], <8
> x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP24:%.*]] = fcmp fast ogt <8 x float> [[TMP2]],
> [[RDX_SHUF]]
> +; SKX-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x float>
> [[TMP2]], <8 x float> [[RDX_SHUF]]
> +; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]],
> <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP25:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; SKX-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x float>
> [[BIN_RDX]], <8 x float> [[RDX_SHUF1]]
> +; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float>
> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP26:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; SKX-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x float>
> [[BIN_RDX2]], <8 x float> [[RDX_SHUF3]]
> +; SKX-NEXT: [[TMP27:%.*]] = extractelement <8 x float> [[BIN_RDX4]],
> i32 0
> +; SKX: ret float [[TMP27]]
> ;
> %2 = load float, float* getelementptr inbounds ([32 x float], [32 x
> float]* @arr1, i64 0, i64 0), align 16
> %3 = load float, float* getelementptr inbounds ([32 x float], [32 x
> float]* @arr1, i64 0, i64 1), align 4
> @@ -1042,151 +572,55 @@ define float @maxf16(float) {
> ; CHECK-NEXT: ret float [[TMP47]]
> ;
> ; AVX-LABEL: @maxf16(
> -; AVX-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
> -; AVX-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
> -; AVX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]
> -; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float
> [[TMP3]]
> -; AVX-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
> -; AVX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]
> -; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float
> [[TMP6]]
> -; AVX-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4
> -; AVX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]
> -; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float
> [[TMP9]]
> -; AVX-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16
> -; AVX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]
> -; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]],
> float [[TMP12]]
> -; AVX-NEXT: [[TMP15:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
> -; AVX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
> -; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]],
> float [[TMP15]]
> -; AVX-NEXT: [[TMP18:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
> -; AVX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
> -; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]],
> float [[TMP18]]
> -; AVX-NEXT: [[TMP21:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
> -; AVX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
> -; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]],
> float [[TMP21]]
> -; AVX-NEXT: [[TMP24:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
> -; AVX-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
> -; AVX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]],
> float [[TMP24]]
> -; AVX-NEXT: [[TMP27:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
> -; AVX-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
> -; AVX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]],
> float [[TMP27]]
> -; AVX-NEXT: [[TMP30:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
> -; AVX-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
> -; AVX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]],
> float [[TMP30]]
> -; AVX-NEXT: [[TMP33:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
> -; AVX-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
> -; AVX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]],
> float [[TMP33]]
> -; AVX-NEXT: [[TMP36:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
> -; AVX-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
> -; AVX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]],
> float [[TMP36]]
> -; AVX-NEXT: [[TMP39:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
> -; AVX-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
> -; AVX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]],
> float [[TMP39]]
> -; AVX-NEXT: [[TMP42:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
> -; AVX-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
> -; AVX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]],
> float [[TMP42]]
> -; AVX-NEXT: [[TMP45:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
> -; AVX-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
> -; AVX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]],
> float [[TMP45]]
> -; AVX-NEXT: ret float [[TMP47]]
> +; AVX-NEXT: [[TMP2:%.*]] = load <16 x float>, <16 x float>* bitcast
> ([32 x float]* @arr1 to <16 x float>*), align 16
> +; AVX: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP2]],
> <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32
> 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP48:%.*]] = fcmp fast ogt <16 x float> [[TMP2]],
> [[RDX_SHUF]]
> +; AVX-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x float>
> [[TMP2]], <16 x float> [[RDX_SHUF]]
> +; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float>
> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP49:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x
> float> [[BIN_RDX]], <16 x float> [[RDX_SHUF1]]
> +; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float>
> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP50:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x
> float> [[BIN_RDX2]], <16 x float> [[RDX_SHUF3]]
> +; AVX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float>
> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP51:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; AVX-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x
> float> [[BIN_RDX4]], <16 x float> [[RDX_SHUF5]]
> +; AVX-NEXT: [[TMP52:%.*]] = extractelement <16 x float> [[BIN_RDX6]],
> i32 0
> +; AVX: ret float [[TMP52]]
> ;
> ; AVX2-LABEL: @maxf16(
> -; AVX2-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
> -; AVX2-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
> -; AVX2-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]
> -; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float
> [[TMP3]]
> -; AVX2-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
> -; AVX2-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]
> -; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float
> [[TMP6]]
> -; AVX2-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4
> -; AVX2-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]
> -; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]],
> float [[TMP9]]
> -; AVX2-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16
> -; AVX2-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]
> -; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]],
> float [[TMP12]]
> -; AVX2-NEXT: [[TMP15:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
> -; AVX2-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
> -; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]],
> float [[TMP15]]
> -; AVX2-NEXT: [[TMP18:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
> -; AVX2-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
> -; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]],
> float [[TMP18]]
> -; AVX2-NEXT: [[TMP21:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
> -; AVX2-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
> -; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]],
> float [[TMP21]]
> -; AVX2-NEXT: [[TMP24:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
> -; AVX2-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
> -; AVX2-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]],
> float [[TMP24]]
> -; AVX2-NEXT: [[TMP27:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
> -; AVX2-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
> -; AVX2-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]],
> float [[TMP27]]
> -; AVX2-NEXT: [[TMP30:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
> -; AVX2-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
> -; AVX2-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]],
> float [[TMP30]]
> -; AVX2-NEXT: [[TMP33:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
> -; AVX2-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
> -; AVX2-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]],
> float [[TMP33]]
> -; AVX2-NEXT: [[TMP36:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
> -; AVX2-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
> -; AVX2-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]],
> float [[TMP36]]
> -; AVX2-NEXT: [[TMP39:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
> -; AVX2-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
> -; AVX2-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]],
> float [[TMP39]]
> -; AVX2-NEXT: [[TMP42:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
> -; AVX2-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
> -; AVX2-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]],
> float [[TMP42]]
> -; AVX2-NEXT: [[TMP45:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
> -; AVX2-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
> -; AVX2-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]],
> float [[TMP45]]
> -; AVX2-NEXT: ret float [[TMP47]]
> +; AVX2-NEXT: [[TMP2:%.*]] = load <16 x float>, <16 x float>* bitcast
> ([32 x float]* @arr1 to <16 x float>*), align 16
> +; AVX2: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP2]],
> <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32
> 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP48:%.*]] = fcmp fast ogt <16 x float> [[TMP2]],
> [[RDX_SHUF]]
> +; AVX2-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x
> float> [[TMP2]], <16 x float> [[RDX_SHUF]]
> +; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float>
> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP49:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x
> float> [[BIN_RDX]], <16 x float> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float>
> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP50:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x
> float> [[BIN_RDX2]], <16 x float> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float>
> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP51:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; AVX2-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x
> float> [[BIN_RDX4]], <16 x float> [[RDX_SHUF5]]
> +; AVX2-NEXT: [[TMP52:%.*]] = extractelement <16 x float> [[BIN_RDX6]],
> i32 0
> +; AVX2: ret float [[TMP52]]
> ;
> ; SKX-LABEL: @maxf16(
> -; SKX-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
> -; SKX-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
> -; SKX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]
> -; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float
> [[TMP3]]
> -; SKX-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
> -; SKX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]
> -; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float
> [[TMP6]]
> -; SKX-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4
> -; SKX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]
> -; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float
> [[TMP9]]
> -; SKX-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16
> -; SKX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]
> -; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]],
> float [[TMP12]]
> -; SKX-NEXT: [[TMP15:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
> -; SKX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
> -; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]],
> float [[TMP15]]
> -; SKX-NEXT: [[TMP18:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
> -; SKX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
> -; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]],
> float [[TMP18]]
> -; SKX-NEXT: [[TMP21:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
> -; SKX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
> -; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]],
> float [[TMP21]]
> -; SKX-NEXT: [[TMP24:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
> -; SKX-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
> -; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]],
> float [[TMP24]]
> -; SKX-NEXT: [[TMP27:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
> -; SKX-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
> -; SKX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]],
> float [[TMP27]]
> -; SKX-NEXT: [[TMP30:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
> -; SKX-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
> -; SKX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]],
> float [[TMP30]]
> -; SKX-NEXT: [[TMP33:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
> -; SKX-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
> -; SKX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]],
> float [[TMP33]]
> -; SKX-NEXT: [[TMP36:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
> -; SKX-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
> -; SKX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]],
> float [[TMP36]]
> -; SKX-NEXT: [[TMP39:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
> -; SKX-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
> -; SKX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]],
> float [[TMP39]]
> -; SKX-NEXT: [[TMP42:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
> -; SKX-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
> -; SKX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]],
> float [[TMP42]]
> -; SKX-NEXT: [[TMP45:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
> -; SKX-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
> -; SKX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]],
> float [[TMP45]]
> -; SKX-NEXT: ret float [[TMP47]]
> +; SKX-NEXT: [[TMP2:%.*]] = load <16 x float>, <16 x float>* bitcast
> ([32 x float]* @arr1 to <16 x float>*), align 16
> +; SKX: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP2]],
> <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32
> 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP48:%.*]] = fcmp fast ogt <16 x float> [[TMP2]],
> [[RDX_SHUF]]
> +; SKX-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x float>
> [[TMP2]], <16 x float> [[RDX_SHUF]]
> +; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float>
> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP49:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; SKX-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x
> float> [[BIN_RDX]], <16 x float> [[RDX_SHUF1]]
> +; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float>
> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP50:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; SKX-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x
> float> [[BIN_RDX2]], <16 x float> [[RDX_SHUF3]]
> +; SKX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float>
> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP51:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; SKX-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x
> float> [[BIN_RDX4]], <16 x float> [[RDX_SHUF5]]
> +; SKX-NEXT: [[TMP52:%.*]] = extractelement <16 x float> [[BIN_RDX6]],
> i32 0
> +; SKX: ret float [[TMP52]]
> ;
> %2 = load float, float* getelementptr inbounds ([32 x float], [32 x
> float]* @arr1, i64 0, i64 0), align 16
> %3 = load float, float* getelementptr inbounds ([32 x float], [32 x
> float]* @arr1, i64 0, i64 1), align 4
> @@ -1336,295 +770,64 @@ define float @maxf32(float) {
> ; CHECK-NEXT: ret float [[TMP95]]
> ;
> ; AVX-LABEL: @maxf32(
> -; AVX-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
> -; AVX-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
> -; AVX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]
> -; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float
> [[TMP3]]
> -; AVX-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
> -; AVX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]
> -; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float
> [[TMP6]]
> -; AVX-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4
> -; AVX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]
> -; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float
> [[TMP9]]
> -; AVX-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16
> -; AVX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]
> -; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]],
> float [[TMP12]]
> -; AVX-NEXT: [[TMP15:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
> -; AVX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
> -; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]],
> float [[TMP15]]
> -; AVX-NEXT: [[TMP18:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
> -; AVX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
> -; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]],
> float [[TMP18]]
> -; AVX-NEXT: [[TMP21:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
> -; AVX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
> -; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]],
> float [[TMP21]]
> -; AVX-NEXT: [[TMP24:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
> -; AVX-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
> -; AVX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]],
> float [[TMP24]]
> -; AVX-NEXT: [[TMP27:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
> -; AVX-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
> -; AVX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]],
> float [[TMP27]]
> -; AVX-NEXT: [[TMP30:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
> -; AVX-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
> -; AVX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]],
> float [[TMP30]]
> -; AVX-NEXT: [[TMP33:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
> -; AVX-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
> -; AVX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]],
> float [[TMP33]]
> -; AVX-NEXT: [[TMP36:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
> -; AVX-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
> -; AVX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]],
> float [[TMP36]]
> -; AVX-NEXT: [[TMP39:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
> -; AVX-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
> -; AVX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]],
> float [[TMP39]]
> -; AVX-NEXT: [[TMP42:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
> -; AVX-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
> -; AVX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]],
> float [[TMP42]]
> -; AVX-NEXT: [[TMP45:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
> -; AVX-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
> -; AVX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]],
> float [[TMP45]]
> -; AVX-NEXT: [[TMP48:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 16), align 16
> -; AVX-NEXT: [[TMP49:%.*]] = fcmp fast ogt float [[TMP47]], [[TMP48]]
> -; AVX-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], float [[TMP47]],
> float [[TMP48]]
> -; AVX-NEXT: [[TMP51:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 17), align 4
> -; AVX-NEXT: [[TMP52:%.*]] = fcmp fast ogt float [[TMP50]], [[TMP51]]
> -; AVX-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], float [[TMP50]],
> float [[TMP51]]
> -; AVX-NEXT: [[TMP54:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 18), align 8
> -; AVX-NEXT: [[TMP55:%.*]] = fcmp fast ogt float [[TMP53]], [[TMP54]]
> -; AVX-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], float [[TMP53]],
> float [[TMP54]]
> -; AVX-NEXT: [[TMP57:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 19), align 4
> -; AVX-NEXT: [[TMP58:%.*]] = fcmp fast ogt float [[TMP56]], [[TMP57]]
> -; AVX-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], float [[TMP56]],
> float [[TMP57]]
> -; AVX-NEXT: [[TMP60:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 20), align 16
> -; AVX-NEXT: [[TMP61:%.*]] = fcmp fast ogt float [[TMP59]], [[TMP60]]
> -; AVX-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], float [[TMP59]],
> float [[TMP60]]
> -; AVX-NEXT: [[TMP63:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 21), align 4
> -; AVX-NEXT: [[TMP64:%.*]] = fcmp fast ogt float [[TMP62]], [[TMP63]]
> -; AVX-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], float [[TMP62]],
> float [[TMP63]]
> -; AVX-NEXT: [[TMP66:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 22), align 8
> -; AVX-NEXT: [[TMP67:%.*]] = fcmp fast ogt float [[TMP65]], [[TMP66]]
> -; AVX-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], float [[TMP65]],
> float [[TMP66]]
> -; AVX-NEXT: [[TMP69:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 23), align 4
> -; AVX-NEXT: [[TMP70:%.*]] = fcmp fast ogt float [[TMP68]], [[TMP69]]
> -; AVX-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], float [[TMP68]],
> float [[TMP69]]
> -; AVX-NEXT: [[TMP72:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 24), align 16
> -; AVX-NEXT: [[TMP73:%.*]] = fcmp fast ogt float [[TMP71]], [[TMP72]]
> -; AVX-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], float [[TMP71]],
> float [[TMP72]]
> -; AVX-NEXT: [[TMP75:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 25), align 4
> -; AVX-NEXT: [[TMP76:%.*]] = fcmp fast ogt float [[TMP74]], [[TMP75]]
> -; AVX-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], float [[TMP74]],
> float [[TMP75]]
> -; AVX-NEXT: [[TMP78:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 26), align 8
> -; AVX-NEXT: [[TMP79:%.*]] = fcmp fast ogt float [[TMP77]], [[TMP78]]
> -; AVX-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], float [[TMP77]],
> float [[TMP78]]
> -; AVX-NEXT: [[TMP81:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 27), align 4
> -; AVX-NEXT: [[TMP82:%.*]] = fcmp fast ogt float [[TMP80]], [[TMP81]]
> -; AVX-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], float [[TMP80]],
> float [[TMP81]]
> -; AVX-NEXT: [[TMP84:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 28), align 16
> -; AVX-NEXT: [[TMP85:%.*]] = fcmp fast ogt float [[TMP83]], [[TMP84]]
> -; AVX-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], float [[TMP83]],
> float [[TMP84]]
> -; AVX-NEXT: [[TMP87:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 29), align 4
> -; AVX-NEXT: [[TMP88:%.*]] = fcmp fast ogt float [[TMP86]], [[TMP87]]
> -; AVX-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], float [[TMP86]],
> float [[TMP87]]
> -; AVX-NEXT: [[TMP90:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 30), align 8
> -; AVX-NEXT: [[TMP91:%.*]] = fcmp fast ogt float [[TMP89]], [[TMP90]]
> -; AVX-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], float [[TMP89]],
> float [[TMP90]]
> -; AVX-NEXT: [[TMP93:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 31), align 4
> -; AVX-NEXT: [[TMP94:%.*]] = fcmp fast ogt float [[TMP92]], [[TMP93]]
> -; AVX-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], float [[TMP92]],
> float [[TMP93]]
> -; AVX-NEXT: ret float [[TMP95]]
> +; AVX-NEXT: [[TMP2:%.*]] = load <32 x float>, <32 x float>* bitcast
> ([32 x float]* @arr1 to <32 x float>*), align 16
> +; AVX: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP2]],
> <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32
> 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30,
> i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP96:%.*]] = fcmp fast ogt <32 x float> [[TMP2]],
> [[RDX_SHUF]]
> +; AVX-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x float>
> [[TMP2]], <32 x float> [[RDX_SHUF]]
> +; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float>
> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11,
> i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP97:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x
> float> [[BIN_RDX]], <32 x float> [[RDX_SHUF1]]
> +; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float>
> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP98:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x
> float> [[BIN_RDX2]], <32 x float> [[RDX_SHUF3]]
> +; AVX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float>
> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP99:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; AVX-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x
> float> [[BIN_RDX4]], <32 x float> [[RDX_SHUF5]]
> +; AVX-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float>
> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; AVX-NEXT: [[TMP100:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX6]],
> [[RDX_SHUF7]]
> +; AVX-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x
> float> [[BIN_RDX6]], <32 x float> [[RDX_SHUF7]]
> +; AVX-NEXT: [[TMP101:%.*]] = extractelement <32 x float> [[BIN_RDX8]],
> i32 0
> +; AVX: ret float [[TMP101]]
> ;
> ; AVX2-LABEL: @maxf32(
> -; AVX2-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
> -; AVX2-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
> -; AVX2-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]
> -; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float
> [[TMP3]]
> -; AVX2-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
> -; AVX2-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]
> -; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float
> [[TMP6]]
> -; AVX2-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4
> -; AVX2-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]
> -; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]],
> float [[TMP9]]
> -; AVX2-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16
> -; AVX2-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]
> -; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]],
> float [[TMP12]]
> -; AVX2-NEXT: [[TMP15:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
> -; AVX2-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
> -; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]],
> float [[TMP15]]
> -; AVX2-NEXT: [[TMP18:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
> -; AVX2-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
> -; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]],
> float [[TMP18]]
> -; AVX2-NEXT: [[TMP21:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
> -; AVX2-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
> -; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]],
> float [[TMP21]]
> -; AVX2-NEXT: [[TMP24:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
> -; AVX2-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
> -; AVX2-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]],
> float [[TMP24]]
> -; AVX2-NEXT: [[TMP27:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
> -; AVX2-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
> -; AVX2-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]],
> float [[TMP27]]
> -; AVX2-NEXT: [[TMP30:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
> -; AVX2-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
> -; AVX2-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]],
> float [[TMP30]]
> -; AVX2-NEXT: [[TMP33:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
> -; AVX2-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
> -; AVX2-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]],
> float [[TMP33]]
> -; AVX2-NEXT: [[TMP36:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
> -; AVX2-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
> -; AVX2-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]],
> float [[TMP36]]
> -; AVX2-NEXT: [[TMP39:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
> -; AVX2-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
> -; AVX2-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]],
> float [[TMP39]]
> -; AVX2-NEXT: [[TMP42:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
> -; AVX2-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
> -; AVX2-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]],
> float [[TMP42]]
> -; AVX2-NEXT: [[TMP45:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
> -; AVX2-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
> -; AVX2-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]],
> float [[TMP45]]
> -; AVX2-NEXT: [[TMP48:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 16), align 16
> -; AVX2-NEXT: [[TMP49:%.*]] = fcmp fast ogt float [[TMP47]], [[TMP48]]
> -; AVX2-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], float [[TMP47]],
> float [[TMP48]]
> -; AVX2-NEXT: [[TMP51:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 17), align 4
> -; AVX2-NEXT: [[TMP52:%.*]] = fcmp fast ogt float [[TMP50]], [[TMP51]]
> -; AVX2-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], float [[TMP50]],
> float [[TMP51]]
> -; AVX2-NEXT: [[TMP54:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 18), align 8
> -; AVX2-NEXT: [[TMP55:%.*]] = fcmp fast ogt float [[TMP53]], [[TMP54]]
> -; AVX2-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], float [[TMP53]],
> float [[TMP54]]
> -; AVX2-NEXT: [[TMP57:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 19), align 4
> -; AVX2-NEXT: [[TMP58:%.*]] = fcmp fast ogt float [[TMP56]], [[TMP57]]
> -; AVX2-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], float [[TMP56]],
> float [[TMP57]]
> -; AVX2-NEXT: [[TMP60:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 20), align 16
> -; AVX2-NEXT: [[TMP61:%.*]] = fcmp fast ogt float [[TMP59]], [[TMP60]]
> -; AVX2-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], float [[TMP59]],
> float [[TMP60]]
> -; AVX2-NEXT: [[TMP63:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 21), align 4
> -; AVX2-NEXT: [[TMP64:%.*]] = fcmp fast ogt float [[TMP62]], [[TMP63]]
> -; AVX2-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], float [[TMP62]],
> float [[TMP63]]
> -; AVX2-NEXT: [[TMP66:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 22), align 8
> -; AVX2-NEXT: [[TMP67:%.*]] = fcmp fast ogt float [[TMP65]], [[TMP66]]
> -; AVX2-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], float [[TMP65]],
> float [[TMP66]]
> -; AVX2-NEXT: [[TMP69:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 23), align 4
> -; AVX2-NEXT: [[TMP70:%.*]] = fcmp fast ogt float [[TMP68]], [[TMP69]]
> -; AVX2-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], float [[TMP68]],
> float [[TMP69]]
> -; AVX2-NEXT: [[TMP72:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 24), align 16
> -; AVX2-NEXT: [[TMP73:%.*]] = fcmp fast ogt float [[TMP71]], [[TMP72]]
> -; AVX2-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], float [[TMP71]],
> float [[TMP72]]
> -; AVX2-NEXT: [[TMP75:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 25), align 4
> -; AVX2-NEXT: [[TMP76:%.*]] = fcmp fast ogt float [[TMP74]], [[TMP75]]
> -; AVX2-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], float [[TMP74]],
> float [[TMP75]]
> -; AVX2-NEXT: [[TMP78:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 26), align 8
> -; AVX2-NEXT: [[TMP79:%.*]] = fcmp fast ogt float [[TMP77]], [[TMP78]]
> -; AVX2-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], float [[TMP77]],
> float [[TMP78]]
> -; AVX2-NEXT: [[TMP81:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 27), align 4
> -; AVX2-NEXT: [[TMP82:%.*]] = fcmp fast ogt float [[TMP80]], [[TMP81]]
> -; AVX2-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], float [[TMP80]],
> float [[TMP81]]
> -; AVX2-NEXT: [[TMP84:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 28), align 16
> -; AVX2-NEXT: [[TMP85:%.*]] = fcmp fast ogt float [[TMP83]], [[TMP84]]
> -; AVX2-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], float [[TMP83]],
> float [[TMP84]]
> -; AVX2-NEXT: [[TMP87:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 29), align 4
> -; AVX2-NEXT: [[TMP88:%.*]] = fcmp fast ogt float [[TMP86]], [[TMP87]]
> -; AVX2-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], float [[TMP86]],
> float [[TMP87]]
> -; AVX2-NEXT: [[TMP90:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 30), align 8
> -; AVX2-NEXT: [[TMP91:%.*]] = fcmp fast ogt float [[TMP89]], [[TMP90]]
> -; AVX2-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], float [[TMP89]],
> float [[TMP90]]
> -; AVX2-NEXT: [[TMP93:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 31), align 4
> -; AVX2-NEXT: [[TMP94:%.*]] = fcmp fast ogt float [[TMP92]], [[TMP93]]
> -; AVX2-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], float [[TMP92]],
> float [[TMP93]]
> -; AVX2-NEXT: ret float [[TMP95]]
> +; AVX2-NEXT: [[TMP2:%.*]] = load <32 x float>, <32 x float>* bitcast
> ([32 x float]* @arr1 to <32 x float>*), align 16
> +; AVX2: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP2]],
> <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32
> 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30,
> i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP96:%.*]] = fcmp fast ogt <32 x float> [[TMP2]],
> [[RDX_SHUF]]
> +; AVX2-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x
> float> [[TMP2]], <32 x float> [[RDX_SHUF]]
> +; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float>
> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11,
> i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP97:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x
> float> [[BIN_RDX]], <32 x float> [[RDX_SHUF1]]
> +; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float>
> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP98:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x
> float> [[BIN_RDX2]], <32 x float> [[RDX_SHUF3]]
> +; AVX2-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float>
> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP99:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; AVX2-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x
> float> [[BIN_RDX4]], <32 x float> [[RDX_SHUF5]]
> +; AVX2-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float>
> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; AVX2-NEXT: [[TMP100:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX6]],
> [[RDX_SHUF7]]
> +; AVX2-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x
> float> [[BIN_RDX6]], <32 x float> [[RDX_SHUF7]]
> +; AVX2-NEXT: [[TMP101:%.*]] = extractelement <32 x float>
> [[BIN_RDX8]], i32 0
> +; AVX2: ret float [[TMP101]]
> ;
> ; SKX-LABEL: @maxf32(
> -; SKX-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
> -; SKX-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
> -; SKX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]
> -; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float
> [[TMP3]]
> -; SKX-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
> -; SKX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]
> -; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float
> [[TMP6]]
> -; SKX-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4
> -; SKX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]
> -; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float
> [[TMP9]]
> -; SKX-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16
> -; SKX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]
> -; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]],
> float [[TMP12]]
> -; SKX-NEXT: [[TMP15:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
> -; SKX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
> -; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]],
> float [[TMP15]]
> -; SKX-NEXT: [[TMP18:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
> -; SKX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
> -; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]],
> float [[TMP18]]
> -; SKX-NEXT: [[TMP21:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
> -; SKX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
> -; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]],
> float [[TMP21]]
> -; SKX-NEXT: [[TMP24:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
> -; SKX-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
> -; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]],
> float [[TMP24]]
> -; SKX-NEXT: [[TMP27:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
> -; SKX-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
> -; SKX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]],
> float [[TMP27]]
> -; SKX-NEXT: [[TMP30:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
> -; SKX-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
> -; SKX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]],
> float [[TMP30]]
> -; SKX-NEXT: [[TMP33:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
> -; SKX-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
> -; SKX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]],
> float [[TMP33]]
> -; SKX-NEXT: [[TMP36:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
> -; SKX-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
> -; SKX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]],
> float [[TMP36]]
> -; SKX-NEXT: [[TMP39:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
> -; SKX-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
> -; SKX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]],
> float [[TMP39]]
> -; SKX-NEXT: [[TMP42:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
> -; SKX-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
> -; SKX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]],
> float [[TMP42]]
> -; SKX-NEXT: [[TMP45:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
> -; SKX-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
> -; SKX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]],
> float [[TMP45]]
> -; SKX-NEXT: [[TMP48:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 16), align 16
> -; SKX-NEXT: [[TMP49:%.*]] = fcmp fast ogt float [[TMP47]], [[TMP48]]
> -; SKX-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], float [[TMP47]],
> float [[TMP48]]
> -; SKX-NEXT: [[TMP51:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 17), align 4
> -; SKX-NEXT: [[TMP52:%.*]] = fcmp fast ogt float [[TMP50]], [[TMP51]]
> -; SKX-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], float [[TMP50]],
> float [[TMP51]]
> -; SKX-NEXT: [[TMP54:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 18), align 8
> -; SKX-NEXT: [[TMP55:%.*]] = fcmp fast ogt float [[TMP53]], [[TMP54]]
> -; SKX-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], float [[TMP53]],
> float [[TMP54]]
> -; SKX-NEXT: [[TMP57:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 19), align 4
> -; SKX-NEXT: [[TMP58:%.*]] = fcmp fast ogt float [[TMP56]], [[TMP57]]
> -; SKX-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], float [[TMP56]],
> float [[TMP57]]
> -; SKX-NEXT: [[TMP60:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 20), align 16
> -; SKX-NEXT: [[TMP61:%.*]] = fcmp fast ogt float [[TMP59]], [[TMP60]]
> -; SKX-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], float [[TMP59]],
> float [[TMP60]]
> -; SKX-NEXT: [[TMP63:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 21), align 4
> -; SKX-NEXT: [[TMP64:%.*]] = fcmp fast ogt float [[TMP62]], [[TMP63]]
> -; SKX-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], float [[TMP62]],
> float [[TMP63]]
> -; SKX-NEXT: [[TMP66:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 22), align 8
> -; SKX-NEXT: [[TMP67:%.*]] = fcmp fast ogt float [[TMP65]], [[TMP66]]
> -; SKX-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], float [[TMP65]],
> float [[TMP66]]
> -; SKX-NEXT: [[TMP69:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 23), align 4
> -; SKX-NEXT: [[TMP70:%.*]] = fcmp fast ogt float [[TMP68]], [[TMP69]]
> -; SKX-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], float [[TMP68]],
> float [[TMP69]]
> -; SKX-NEXT: [[TMP72:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 24), align 16
> -; SKX-NEXT: [[TMP73:%.*]] = fcmp fast ogt float [[TMP71]], [[TMP72]]
> -; SKX-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], float [[TMP71]],
> float [[TMP72]]
> -; SKX-NEXT: [[TMP75:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 25), align 4
> -; SKX-NEXT: [[TMP76:%.*]] = fcmp fast ogt float [[TMP74]], [[TMP75]]
> -; SKX-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], float [[TMP74]],
> float [[TMP75]]
> -; SKX-NEXT: [[TMP78:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 26), align 8
> -; SKX-NEXT: [[TMP79:%.*]] = fcmp fast ogt float [[TMP77]], [[TMP78]]
> -; SKX-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], float [[TMP77]],
> float [[TMP78]]
> -; SKX-NEXT: [[TMP81:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 27), align 4
> -; SKX-NEXT: [[TMP82:%.*]] = fcmp fast ogt float [[TMP80]], [[TMP81]]
> -; SKX-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], float [[TMP80]],
> float [[TMP81]]
> -; SKX-NEXT: [[TMP84:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 28), align 16
> -; SKX-NEXT: [[TMP85:%.*]] = fcmp fast ogt float [[TMP83]], [[TMP84]]
> -; SKX-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], float [[TMP83]],
> float [[TMP84]]
> -; SKX-NEXT: [[TMP87:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 29), align 4
> -; SKX-NEXT: [[TMP88:%.*]] = fcmp fast ogt float [[TMP86]], [[TMP87]]
> -; SKX-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], float [[TMP86]],
> float [[TMP87]]
> -; SKX-NEXT: [[TMP90:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 30), align 8
> -; SKX-NEXT: [[TMP91:%.*]] = fcmp fast ogt float [[TMP89]], [[TMP90]]
> -; SKX-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], float [[TMP89]],
> float [[TMP90]]
> -; SKX-NEXT: [[TMP93:%.*]] = load float, float* getelementptr inbounds
> ([32 x float], [32 x float]* @arr1, i64 0, i64 31), align 4
> -; SKX-NEXT: [[TMP94:%.*]] = fcmp fast ogt float [[TMP92]], [[TMP93]]
> -; SKX-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], float [[TMP92]],
> float [[TMP93]]
> -; SKX-NEXT: ret float [[TMP95]]
> +; SKX-NEXT: [[TMP2:%.*]] = load <32 x float>, <32 x float>* bitcast
> ([32 x float]* @arr1 to <32 x float>*), align 16
> +; SKX: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP2]],
> <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32
> 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30,
> i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP96:%.*]] = fcmp fast ogt <32 x float> [[TMP2]],
> [[RDX_SHUF]]
> +; SKX-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x float>
> [[TMP2]], <32 x float> [[RDX_SHUF]]
> +; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float>
> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11,
> i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP97:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX]],
> [[RDX_SHUF1]]
> +; SKX-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x
> float> [[BIN_RDX]], <32 x float> [[RDX_SHUF1]]
> +; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float>
> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP98:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX2]],
> [[RDX_SHUF3]]
> +; SKX-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x
> float> [[BIN_RDX2]], <32 x float> [[RDX_SHUF3]]
> +; SKX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float>
> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP99:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX4]],
> [[RDX_SHUF5]]
> +; SKX-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x
> float> [[BIN_RDX4]], <32 x float> [[RDX_SHUF5]]
> +; SKX-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float>
> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32
> undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef>
> +; SKX-NEXT: [[TMP100:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX6]],
> [[RDX_SHUF7]]
> +; SKX-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x
> float> [[BIN_RDX6]], <32 x float> [[RDX_SHUF7]]
> +; SKX-NEXT: [[TMP101:%.*]] = extractelement <32 x float> [[BIN_RDX8]],
> i32 0
> +; SKX: ret float [[TMP101]]
> ;
> %2 = load float, float* getelementptr inbounds ([32 x float], [32 x
> float]* @arr1, i64 0, i64 0), align 16
> %3 = load float, float* getelementptr inbounds ([32 x float], [32 x
> float]* @arr1, i64 0, i64 1), align 4
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170908/d1d41021/attachment-0001.html>
More information about the llvm-commits
mailing list