<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">This appears to be breaking various llvm test-suite builds. Could you fix or revert? Example:</div><div class=""><br class=""></div><div class=""><a href="http://lab.llvm.org:8080/green/job/lnt-ctmark-aarch64-O3-flto/181/" class="">http://lab.llvm.org:8080/green/job/lnt-ctmark-aarch64-O3-flto/181/</a></div><div class=""><br class=""></div><div class="">(You didn't get an e-mail because we have/had a configuration problem on greendragon that wouldn't send blame mails for some jobs).</div><div class=""><br class=""></div><div class="">- Matthias</div><div class=""><br class=""></div><div><blockquote type="cite" class=""><div class="">On Nov 7, 2017, at 1:25 PM, Dinar Temirbulatov via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Author: dinar<br class="">Date: Tue Nov 7 13:25:34 2017<br class="">New Revision: 317618<br class=""><br class="">URL: <a href="http://llvm.org/viewvc/llvm-project?rev=317618&view=rev" class="">http://llvm.org/viewvc/llvm-project?rev=317618&view=rev</a><br class="">Log:<br class="">[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.<br class=""><br class=""> Patch tries to improve vectorization of the following code:<br class=""><br class=""> void add1(int * __restrict dst, const int * __restrict src) {<br class=""> *dst++ = *src++;<br class=""> *dst++ = *src++ + 1;<br class=""> *dst++ = *src++ + 2;<br class=""> *dst++ = *src++ + 3;<br class=""> }<br class=""> Allows to vectorize even if the very first operation is not a binary add, but just a load.<br class=""><br class=""> Fixed PR34619 and other issues related to previous commit.<br class=""><br class=""> Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev<br class=""><br class=""> Reviewed By: ABataev, RKSimon<br class=""><br class=""> Subscribers: llvm-commits, RKSimon<br class=""><br class=""> Differential Revision: <a href="https://reviews.llvm.org/D28907" class="">https://reviews.llvm.org/D28907</a><br class=""><br class="">Added:<br class=""> llvm/trunk/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll<br class="">Modified:<br class=""> llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp<br class=""> llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll<br class=""><br class="">Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=317618&r1=317617&r2=317618&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=317618&r1=317617&r2=317618&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)<br class="">+++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Tue Nov 7 13:25:34 2017<br class="">@@ -333,7 +333,7 @@ static unsigned getAltOpcode(unsigned Op<br class=""> case Instruction::Sub:<br class=""> return Instruction::Add;<br class=""> default:<br class="">- return 0;<br class="">+ return Op;<br class=""> }<br class=""> }<br class=""><br class="">@@ -346,6 +346,20 @@ static bool sameOpcodeOrAlt(unsigned Opc<br class=""> return Opcode == CheckedOpcode || AltOpcode == CheckedOpcode;<br class=""> }<br class=""><br class="">+/// Checks if the \p Opcode can be considered as an operand of a (possibly)<br class="">+/// binary operation \p I.<br class="">+/// \returns The code of the binary operation of instruction \p I if the<br class="">+/// instruction with \p Opcode can be considered as an operand of \p I with the<br class="">+/// default value.<br class="">+static unsigned tryToRepresentAsInstArg(unsigned Opcode, Instruction *I) {<br class="">+ assert(!sameOpcodeOrAlt(Opcode, getAltOpcode(Opcode), I->getOpcode())<br class="">+ && "Invalid Opcode");<br class="">+ if (Opcode != Instruction::PHI && isa<BinaryOperator>(I) &&<br class="">+ (I->getType()->isIntegerTy() || cast<FPMathOperator>(I)->isFast()))<br class="">+ return I->getOpcode();<br class="">+ return 0;<br class="">+}<br class="">+<br class=""> /// Chooses the correct key for scheduling data. If \p Op has the same (or<br class=""> /// alternate) opcode as \p OpValue, the key is \p Op. Otherwise the key is \p<br class=""> /// OpValue.<br class="">@@ -367,7 +381,12 @@ namespace {<br class=""> struct RawInstructionsData {<br class=""> /// Main Opcode of the instructions going to be vectorized.<br class=""> unsigned Opcode = 0;<br class="">-<br class="">+ /// Position of the first instruction with the \a Opcode.<br class="">+ unsigned OpcodePos = 0;<br class="">+ /// Need an additional analysis (if at least one of the instruction is not<br class="">+ /// same instruction kind as an instruction at OpcodePos position in the<br class="">+ /// list).<br class="">+ bool NeedAnalysis = false;<br class=""> /// The list of instructions have some instructions with alternate opcodes.<br class=""> bool HasAltOpcodes = false;<br class=""> };<br class="">@@ -382,16 +401,38 @@ static RawInstructionsData getMainOpcode<br class=""> return {};<br class=""> RawInstructionsData Res;<br class=""> unsigned Opcode = I0->getOpcode();<br class="">+ unsigned AltOpcode = getAltOpcode(Opcode);<br class="">+ unsigned NewOpcodePos = 0;<br class=""> // Walk through the list of the vectorized instructions<br class=""> // in order to check its structure described by RawInstructionsData.<br class=""> for (unsigned Cnt = 0, E = VL.size(); Cnt != E; ++Cnt) {<br class=""> auto *I = dyn_cast<Instruction>(VL[Cnt]);<br class=""> if (!I)<br class=""> return {};<br class="">- if (Opcode != I->getOpcode())<br class="">- Res.HasAltOpcodes = true;<br class="">+ if (sameOpcodeOrAlt(Opcode, AltOpcode, I->getOpcode())) {<br class="">+ if (Opcode != I->getOpcode()) {<br class="">+ Res.HasAltOpcodes = true;<br class="">+ if (Res.NeedAnalysis && isOdd(NewOpcodePos))<br class="">+ std::swap(Opcode, AltOpcode);<br class="">+ }<br class="">+ continue;<br class="">+ }<br class="">+ if (unsigned NewOpcode = tryToRepresentAsInstArg(Opcode, I)) {<br class="">+ if (!Instruction::isBinaryOp(Opcode) ||<br class="">+ !Instruction::isCommutative(Opcode)) {<br class="">+ NewOpcodePos = Cnt;<br class="">+ Opcode = NewOpcode;<br class="">+ AltOpcode = getAltOpcode(Opcode);<br class="">+ Res.NeedAnalysis = true;<br class="">+ }<br class="">+ } else if (tryToRepresentAsInstArg(I->getOpcode(),<br class="">+ cast<Instruction>(VL[NewOpcodePos])))<br class="">+ Res.NeedAnalysis = true;<br class="">+ else<br class="">+ return {};<br class=""> }<br class=""> Res.Opcode = Opcode;<br class="">+ Res.OpcodePos = NewOpcodePos;<br class=""> return Res;<br class=""> }<br class=""><br class="">@@ -421,16 +462,20 @@ struct InstructionsState {<br class=""> static InstructionsState getSameOpcode(ArrayRef<Value *> VL) {<br class=""> auto Res = getMainOpcode(VL);<br class=""> unsigned Opcode = Res.Opcode;<br class="">- if (!Res.HasAltOpcodes)<br class="">- return InstructionsState(VL[0], Opcode, false);<br class="">- auto *OpInst = cast<Instruction>(VL[0]);<br class="">+ if (!Res.NeedAnalysis && !Res.HasAltOpcodes)<br class="">+ return InstructionsState(VL[Res.OpcodePos], Opcode, false);<br class="">+ auto *OpInst = cast<Instruction>(VL[Res.OpcodePos]);<br class=""> unsigned AltOpcode = getAltOpcode(Opcode);<br class=""> // Examine each element in the list instructions VL to determine<br class=""> // if some operations there could be considered as an alternative<br class="">- // (for example as subtraction relates to addition operation).<br class="">+ // (for example as subtraction relates to addition operation) or <br class="">+ // operation could be an operand of a (possibly) binary operation.<br class=""> for (int Cnt = 0, E = VL.size(); Cnt < E; Cnt++) {<br class=""> auto *I = cast<Instruction>(VL[Cnt]);<br class=""> unsigned InstOpcode = I->getOpcode();<br class="">+ if (Res.NeedAnalysis && !sameOpcodeOrAlt(Opcode, AltOpcode, InstOpcode))<br class="">+ if (tryToRepresentAsInstArg(InstOpcode, OpInst))<br class="">+ InstOpcode = (Res.HasAltOpcodes && isOdd(Cnt)) ? AltOpcode : Opcode;<br class=""> if ((Res.HasAltOpcodes &&<br class=""> InstOpcode != (isOdd(Cnt) ? AltOpcode : Opcode)) ||<br class=""> (!Res.HasAltOpcodes && InstOpcode != Opcode)) {<br class="">@@ -583,6 +628,7 @@ public:<br class=""> void deleteTree() {<br class=""> VectorizableTree.clear();<br class=""> ScalarToTreeEntry.clear();<br class="">+ ExtraScalarToTreeEntry.clear();<br class=""> MustGather.clear();<br class=""> ExternalUses.clear();<br class=""> NumLoadsWantToKeepOrder = 0;<br class="">@@ -722,22 +768,40 @@ private:<br class=""> /// The TreeEntry index containing the user of this entry. We can actually<br class=""> /// have multiple users so the data structure is not truly a tree.<br class=""> SmallVector<int, 1> UserTreeIndices;<br class="">+<br class="">+ /// Info about instruction in this tree entry.<br class="">+ InstructionsState State;<br class=""> };<br class=""><br class=""> /// Create a new VectorizableTree entry.<br class=""> TreeEntry *newTreeEntry(ArrayRef<Value *> VL, bool Vectorized,<br class="">- int &UserTreeIdx) {<br class="">+ int &UserTreeIdx, const InstructionsState &S) {<br class="">+ assert((!Vectorized || S.Opcode != 0) &&<br class="">+ "Vectorized TreeEntry without opcode");<br class=""> VectorizableTree.emplace_back(VectorizableTree);<br class=""> int idx = VectorizableTree.size() - 1;<br class=""> TreeEntry *Last = &VectorizableTree[idx];<br class=""> Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());<br class=""> Last->NeedToGather = !Vectorized;<br class=""> if (Vectorized) {<br class="">+ Last->State = S;<br class="">+ unsigned AltOpcode = getAltOpcode(S.Opcode);<br class=""> for (int i = 0, e = VL.size(); i != e; ++i) {<br class="">- assert(!getTreeEntry(VL[i]) && "Scalar already in tree!");<br class="">- ScalarToTreeEntry[VL[i]] = idx;<br class="">+ unsigned RealOpcode =<br class="">+ (S.IsAltShuffle && isOdd(i)) ? AltOpcode : S.Opcode;<br class="">+ Value *Key = (cast<Instruction>(VL[i])->getOpcode() == RealOpcode)<br class="">+ ? VL[i]<br class="">+ : S.OpValue;<br class="">+ assert(!getTreeEntry(VL[i], Key) && "Scalar already in tree!");<br class="">+ if (VL[i] == Key)<br class="">+ ScalarToTreeEntry[Key] = idx;<br class="">+ else<br class="">+ ExtraScalarToTreeEntry[VL[i]][Key] = idx;<br class=""> }<br class=""> } else {<br class="">+ Last->State.Opcode = 0;<br class="">+ Last->State.OpValue = VL[0];<br class="">+ Last->State.IsAltShuffle = false;<br class=""> MustGather.insert(VL.begin(), VL.end());<br class=""> }<br class=""><br class="">@@ -765,8 +829,24 @@ private:<br class=""> return nullptr;<br class=""> }<br class=""><br class="">+ TreeEntry *getTreeEntry(Value *V, Value *OpValue) {<br class="">+ if (V == OpValue)<br class="">+ return getTreeEntry(V);<br class="">+ auto I = ExtraScalarToTreeEntry.find(V);<br class="">+ if (I != ExtraScalarToTreeEntry.end()) {<br class="">+ auto &STT = I->second;<br class="">+ auto STTI = STT.find(OpValue);<br class="">+ if (STTI != STT.end())<br class="">+ return &VectorizableTree[STTI->second];<br class="">+ }<br class="">+ return nullptr;<br class="">+ }<br class="">+<br class=""> /// Maps a specific scalar to its tree entry.<br class="">- SmallDenseMap<Value*, int> ScalarToTreeEntry;<br class="">+ SmallDenseMap<Value *, int> ScalarToTreeEntry;<br class="">+<br class="">+ /// Maps a specific scalar to its tree entry(s) with leading scalar.<br class="">+ SmallDenseMap<Value *, SmallDenseMap<Value *, int>> ExtraScalarToTreeEntry;<br class=""><br class=""> /// A list of scalars that we found that we need to keep as scalars.<br class=""> ValueSet MustGather;<br class="">@@ -1338,9 +1418,15 @@ void BoUpSLP::buildTree(ArrayRef<Value *<br class=""> continue;<br class=""><br class=""> // For each lane:<br class="">+ const unsigned Opcode = Entry->State.Opcode;<br class="">+ const unsigned AltOpcode = getAltOpcode(Opcode);<br class=""> for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {<br class=""> Value *Scalar = Entry->Scalars[Lane];<br class=""><br class="">+ if (!sameOpcodeOrAlt(Opcode, AltOpcode,<br class="">+ cast<Instruction>(Scalar)->getOpcode()))<br class="">+ continue;<br class="">+<br class=""> // Check if the scalar is externally used as an extra arg.<br class=""> auto ExtI = ExternallyUsedValues.find(Scalar);<br class=""> if (ExtI != ExternallyUsedValues.end()) {<br class="">@@ -1383,6 +1469,38 @@ void BoUpSLP::buildTree(ArrayRef<Value *<br class=""> }<br class=""> }<br class=""><br class="">+static Value *getDefaultConstantForOpcode(unsigned Opcode, Type *Ty) {<br class="">+ switch(Opcode) {<br class="">+ case Instruction::Add:<br class="">+ case Instruction::Sub:<br class="">+ case Instruction::Or:<br class="">+ case Instruction::Xor:<br class="">+ return ConstantInt::getNullValue(Ty);<br class="">+ case Instruction::Mul:<br class="">+ case Instruction::UDiv:<br class="">+ case Instruction::SDiv:<br class="">+ case Instruction::URem:<br class="">+ case Instruction::SRem:<br class="">+ return ConstantInt::get(Ty, /*V=*/1);<br class="">+ case Instruction::FAdd:<br class="">+ case Instruction::FSub:<br class="">+ return ConstantFP::get(Ty, /*V=*/0.0);<br class="">+ case Instruction::FMul:<br class="">+ case Instruction::FDiv:<br class="">+ case Instruction::FRem:<br class="">+ return ConstantFP::get(Ty, /*V=*/1.0);<br class="">+ case Instruction::And:<br class="">+ return ConstantInt::getAllOnesValue(Ty);<br class="">+ case Instruction::Shl:<br class="">+ case Instruction::LShr:<br class="">+ case Instruction::AShr:<br class="">+ return ConstantInt::getNullValue(Type::getInt32Ty(Ty->getContext()));<br class="">+ default:<br class="">+ break;<br class="">+ }<br class="">+ llvm_unreachable("unknown binop for default constant value");<br class="">+}<br class="">+<br class=""> void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,<br class=""> int UserTreeIdx) {<br class=""> assert((allConstant(VL) || allSameType(VL)) && "Invalid types!");<br class="">@@ -1390,31 +1508,46 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> InstructionsState S = getSameOpcode(VL);<br class=""> if (Depth == RecursionMaxDepth) {<br class=""> DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""><br class=""> // Don't handle vectors.<br class=""> if (S.OpValue->getType()->isVectorTy()) {<br class=""> DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""><br class=""> if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue))<br class=""> if (SI->getValueOperand()->getType()->isVectorTy()) {<br class=""> DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""><br class=""> // If all of the operands are identical or constant we have a simple solution.<br class=""> if (allConstant(VL) || isSplat(VL) || !allSameBlock(VL) || !S.Opcode) {<br class=""> DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""><br class="">+ // Avoid any vectors that are wider than two elements and<br class="">+ // with real operations less than or equal to half of vector<br class="">+ // to others members are operands to that operations.<br class="">+ unsigned AltOpcode = getAltOpcode(S.Opcode);<br class="">+ unsigned SameOrAlt = 0;<br class="">+ if (VL.size() > 2) {<br class="">+ for (Value *V : VL) {<br class="">+ auto *Instr = cast<Instruction>(V);<br class="">+ if (sameOpcodeOrAlt(S.Opcode, AltOpcode, Instr->getOpcode()))<br class="">+ SameOrAlt++;<br class="">+ }<br class="">+ if (SameOrAlt <= (VL.size() / 2))<br class="">+ return;<br class="">+ }<br class="">+<br class=""> // We now know that this is a vector of instructions of the same type from<br class=""> // the same block.<br class=""><br class="">@@ -1423,7 +1556,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> if (EphValues.count(VL[i])) {<br class=""> DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<<br class=""> ") is ephemeral.\n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> }<br class="">@@ -1434,7 +1567,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> DEBUG(dbgs() << "SLP: \tChecking bundle: " << *VL[i] << ".\n");<br class=""> if (E->Scalars[i] != VL[i]) {<br class=""> DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> }<br class="">@@ -1453,7 +1586,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> if (getTreeEntry(I)) {<br class=""> DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<<br class=""> ") is already in tree.\n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> }<br class="">@@ -1463,7 +1596,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> for (unsigned i = 0, e = VL.size(); i != e; ++i) {<br class=""> if (MustGather.count(VL[i])) {<br class=""> DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> }<br class="">@@ -1477,7 +1610,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> // Don't go into unreachable blocks. They may contain instructions with<br class=""> // dependency cycles which confuse the final scheduling.<br class=""> DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""><br class="">@@ -1486,7 +1619,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> for (unsigned j = i + 1; j < e; ++j)<br class=""> if (VL[i] == VL[j]) {<br class=""> DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""><br class="">@@ -1501,7 +1634,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> assert((!BS.getScheduleData(VL0) ||<br class=""> !BS.getScheduleData(VL0)->isPartOfBundle()) &&<br class=""> "tryScheduleBundle should cancelScheduling on failure");<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");<br class="">@@ -1520,12 +1653,12 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> if (Term) {<br class=""> DEBUG(dbgs() << "SLP: Need to swizzle PHINodes (TerminatorInst use).\n");<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> }<br class=""><br class="">- newTreeEntry(VL, true, UserTreeIdx);<br class="">+ newTreeEntry(VL, true, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");<br class=""><br class=""> for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {<br class="">@@ -1547,7 +1680,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> } else {<br class=""> BS.cancelScheduling(VL, VL0);<br class=""> }<br class="">- newTreeEntry(VL, Reuse, UserTreeIdx);<br class="">+ newTreeEntry(VL, Reuse, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> case Instruction::Load: {<br class="">@@ -1562,7 +1695,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> if (DL->getTypeSizeInBits(ScalarTy) !=<br class=""> DL->getTypeAllocSizeInBits(ScalarTy)) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");<br class=""> return;<br class=""> }<br class="">@@ -1573,7 +1706,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> LoadInst *L = cast<LoadInst>(VL[i]);<br class=""> if (!L->isSimple()) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");<br class=""> return;<br class=""> }<br class="">@@ -1595,7 +1728,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""><br class=""> if (Consecutive) {<br class=""> ++NumLoadsWantToKeepOrder;<br class="">- newTreeEntry(VL, true, UserTreeIdx);<br class="">+ newTreeEntry(VL, true, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: added a vector of loads.\n");<br class=""> return;<br class=""> }<br class="">@@ -1610,7 +1743,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> }<br class=""><br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""><br class=""> if (ReverseConsecutive) {<br class=""> ++NumLoadsWantToChangeOrder;<br class="">@@ -1637,12 +1770,12 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> Type *Ty = cast<Instruction>(VL[i])->getOperand(0)->getType();<br class=""> if (Ty != SrcTy || !isValidElementType(Ty)) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: Gathering casts with different src types.\n");<br class=""> return;<br class=""> }<br class=""> }<br class="">- newTreeEntry(VL, true, UserTreeIdx);<br class="">+ newTreeEntry(VL, true, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: added a vector of casts.\n");<br class=""><br class=""> for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {<br class="">@@ -1665,13 +1798,13 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> if (Cmp->getPredicate() != P0 ||<br class=""> Cmp->getOperand(0)->getType() != ComparedTy) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: Gathering cmp with different predicate.\n");<br class=""> return;<br class=""> }<br class=""> }<br class=""><br class="">- newTreeEntry(VL, true, UserTreeIdx);<br class="">+ newTreeEntry(VL, true, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: added a vector of compares.\n");<br class=""><br class=""> for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {<br class="">@@ -1703,7 +1836,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> case Instruction::And:<br class=""> case Instruction::Or:<br class=""> case Instruction::Xor:<br class="">- newTreeEntry(VL, true, UserTreeIdx);<br class="">+ newTreeEntry(VL, true, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: added a vector of bin op.\n");<br class=""><br class=""> // Sort operands of the instructions so that each side is more likely to<br class="">@@ -1719,10 +1852,21 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {<br class=""> ValueList Operands;<br class=""> // Prepare the operand vector.<br class="">- for (Value *j : VL)<br class="">- Operands.push_back(cast<Instruction>(j)->getOperand(i));<br class="">-<br class="">- buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br class="">+ for (Value *VecOp : VL) {<br class="">+ auto *I = cast<Instruction>(VecOp);<br class="">+ if (I->getOpcode() == S.Opcode) {<br class="">+ Operands.push_back(I->getOperand(i));<br class="">+ continue;<br class="">+ }<br class="">+ assert(Instruction::isBinaryOp(S.Opcode) &&<br class="">+ "Expected a binary operation.");<br class="">+ Value *Operand = isOdd(i)<br class="">+ ? getDefaultConstantForOpcode(S.Opcode, I->getType())<br class="">+ : VecOp;<br class="">+ Operands.push_back(Operand);<br class="">+ }<br class="">+ if (allSameType(Operands))<br class="">+ buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br class=""> }<br class=""> return;<br class=""><br class="">@@ -1732,7 +1876,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> if (cast<Instruction>(VL[j])->getNumOperands() != 2) {<br class=""> DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> }<br class="">@@ -1745,7 +1889,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> if (Ty0 != CurTy) {<br class=""> DEBUG(dbgs() << "SLP: not-vectorizable GEP (different types).\n");<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> }<br class="">@@ -1757,12 +1901,12 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> DEBUG(<br class=""> dbgs() << "SLP: not-vectorizable GEP (non-constant indexes).\n");<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> return;<br class=""> }<br class=""> }<br class=""><br class="">- newTreeEntry(VL, true, UserTreeIdx);<br class="">+ newTreeEntry(VL, true, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");<br class=""> for (unsigned i = 0, e = 2; i < e; ++i) {<br class=""> ValueList Operands;<br class="">@@ -1779,12 +1923,12 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> for (unsigned i = 0, e = VL.size() - 1; i < e; ++i)<br class=""> if (!isConsecutiveAccess(VL[i], VL[i + 1], *DL, *SE)) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: Non-consecutive store.\n");<br class=""> return;<br class=""> }<br class=""><br class="">- newTreeEntry(VL, true, UserTreeIdx);<br class="">+ newTreeEntry(VL, true, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: added a vector of stores.\n");<br class=""><br class=""> ValueList Operands;<br class="">@@ -1802,7 +1946,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);<br class=""> if (!isTriviallyVectorizable(ID)) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");<br class=""> return;<br class=""> }<br class="">@@ -1816,7 +1960,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> getVectorIntrinsicIDForCall(CI2, TLI) != ID ||<br class=""> !CI->hasIdenticalOperandBundleSchema(*CI2)) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: mismatched calls:" << *CI << "!=" << *VL[i]<br class=""> << "\n");<br class=""> return;<br class="">@@ -1827,7 +1971,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> Value *A1J = CI2->getArgOperand(1);<br class=""> if (A1I != A1J) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI<br class=""> << " argument "<< A1I<<"!=" << A1J<br class=""> << "\n");<br class="">@@ -1840,14 +1984,14 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> CI->op_begin() + CI->getBundleOperandsEndIndex(),<br class=""> CI2->op_begin() + CI2->getBundleOperandsStartIndex())) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:" << *CI << "!="<br class=""> << *VL[i] << '\n');<br class=""> return;<br class=""> }<br class=""> }<br class=""><br class="">- newTreeEntry(VL, true, UserTreeIdx);<br class="">+ newTreeEntry(VL, true, UserTreeIdx, S);<br class=""> for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {<br class=""> ValueList Operands;<br class=""> // Prepare the operand vector.<br class="">@@ -1864,11 +2008,11 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> // then do not vectorize this instruction.<br class=""> if (!S.IsAltShuffle) {<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");<br class=""> return;<br class=""> }<br class="">- newTreeEntry(VL, true, UserTreeIdx);<br class="">+ newTreeEntry(VL, true, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");<br class=""><br class=""> // Reorder operands if reordering would enable vectorization.<br class="">@@ -1883,8 +2027,19 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {<br class=""> ValueList Operands;<br class=""> // Prepare the operand vector.<br class="">- for (Value *j : VL)<br class="">- Operands.push_back(cast<Instruction>(j)->getOperand(i));<br class="">+ for (Value *VecOp : VL) {<br class="">+ auto *I = cast<Instruction>(VecOp);<br class="">+ if (sameOpcodeOrAlt(S.Opcode, AltOpcode, I->getOpcode())) {<br class="">+ Operands.push_back(I->getOperand(i));<br class="">+ continue;<br class="">+ }<br class="">+ assert(Instruction::isBinaryOp(S.Opcode) &&<br class="">+ "Expected a binary operation.");<br class="">+ Value *Operand = isOdd(i)<br class="">+ ? getDefaultConstantForOpcode(S.Opcode, I->getType())<br class="">+ : VecOp;<br class="">+ Operands.push_back(Operand);<br class="">+ }<br class=""><br class=""> buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br class=""> }<br class="">@@ -1892,7 +2047,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""><br class=""> default:<br class=""> BS.cancelScheduling(VL, VL0);<br class="">- newTreeEntry(VL, false, UserTreeIdx);<br class="">+ newTreeEntry(VL, false, UserTreeIdx, S);<br class=""> DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");<br class=""> return;<br class=""> }<br class="">@@ -2013,18 +2168,17 @@ int BoUpSLP::getEntryCost(TreeEntry *E)<br class=""> }<br class=""> return getGatherCost(E->Scalars);<br class=""> }<br class="">- InstructionsState S = getSameOpcode(VL);<br class="">- assert(S.Opcode && allSameType(VL) && allSameBlock(VL) && "Invalid VL");<br class="">- Instruction *VL0 = cast<Instruction>(S.OpValue);<br class="">- unsigned ShuffleOrOp = S.IsAltShuffle ?<br class="">- (unsigned) Instruction::ShuffleVector : S.Opcode;<br class="">+ assert(E->State.Opcode && allSameType(VL) && allSameBlock(VL) && "Invalid VL");<br class="">+ auto *VL0 = cast<Instruction>(E->State.OpValue);<br class="">+ unsigned ShuffleOrOp = E->State.IsAltShuffle ?<br class="">+ (unsigned) Instruction::ShuffleVector : E->State.Opcode;<br class=""> switch (ShuffleOrOp) {<br class=""> case Instruction::PHI:<br class=""> return 0;<br class=""><br class=""> case Instruction::ExtractValue:<br class=""> case Instruction::ExtractElement:<br class="">- if (canReuseExtract(VL, S.OpValue)) {<br class="">+ if (canReuseExtract(VL, E->State.OpValue)) {<br class=""> int DeadCost = 0;<br class=""> for (unsigned i = 0, e = VL.size(); i < e; ++i) {<br class=""> Instruction *E = cast<Instruction>(VL[i]);<br class="">@@ -2068,8 +2222,8 @@ int BoUpSLP::getEntryCost(TreeEntry *E)<br class=""> // Calculate the cost of this instruction.<br class=""> VectorType *MaskTy = VectorType::get(Builder.getInt1Ty(), VL.size());<br class=""> int ScalarCost = VecTy->getNumElements() *<br class="">- TTI->getCmpSelInstrCost(S.Opcode, ScalarTy, Builder.getInt1Ty(), VL0);<br class="">- int VecCost = TTI->getCmpSelInstrCost(S.Opcode, VecTy, MaskTy, VL0);<br class="">+ TTI->getCmpSelInstrCost(ShuffleOrOp, ScalarTy, Builder.getInt1Ty(), VL0);<br class="">+ int VecCost = TTI->getCmpSelInstrCost(ShuffleOrOp, VecTy, MaskTy, VL0);<br class=""> return VecCost - ScalarCost;<br class=""> }<br class=""> case Instruction::Add:<br class="">@@ -2095,7 +2249,7 @@ int BoUpSLP::getEntryCost(TreeEntry *E)<br class=""> TargetTransformInfo::OperandValueKind Op1VK =<br class=""> TargetTransformInfo::OK_AnyValue;<br class=""> TargetTransformInfo::OperandValueKind Op2VK =<br class="">- TargetTransformInfo::OK_UniformConstantValue;<br class="">+ TargetTransformInfo::OK_AnyValue;<br class=""> TargetTransformInfo::OperandValueProperties Op1VP =<br class=""> TargetTransformInfo::OP_None;<br class=""> TargetTransformInfo::OperandValueProperties Op2VP =<br class="">@@ -2106,34 +2260,33 @@ int BoUpSLP::getEntryCost(TreeEntry *E)<br class=""> // If instead not all operands are constants, then set the operand kind<br class=""> // to OK_AnyValue. If all operands are constants but not the same,<br class=""> // then set the operand kind to OK_NonUniformConstantValue.<br class="">- ConstantInt *CInt = nullptr;<br class="">- for (unsigned i = 0; i < VL.size(); ++i) {<br class="">- const Instruction *I = cast<Instruction>(VL[i]);<br class="">- if (!isa<ConstantInt>(I->getOperand(1))) {<br class="">- Op2VK = TargetTransformInfo::OK_AnyValue;<br class="">- break;<br class="">- }<br class="">- if (i == 0) {<br class="">- CInt = cast<ConstantInt>(I->getOperand(1));<br class="">- continue;<br class="">+ if (auto *CInt = dyn_cast<ConstantInt>(VL0->getOperand(1))) {<br class="">+ Op2VK = TargetTransformInfo::OK_UniformConstantValue;<br class="">+ const unsigned Opcode = E->State.Opcode;<br class="">+ for (auto *V : VL) {<br class="">+ auto *I = cast<Instruction>(V);<br class="">+ if (I == VL0 || Opcode != I->getOpcode())<br class="">+ continue;<br class="">+ if (!isa<ConstantInt>(I->getOperand(1))) {<br class="">+ Op2VK = TargetTransformInfo::OK_AnyValue;<br class="">+ break;<br class="">+ }<br class="">+ if (Op2VK == TargetTransformInfo::OK_UniformConstantValue &&<br class="">+ CInt != cast<ConstantInt>(I->getOperand(1)))<br class="">+ Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;<br class=""> }<br class="">+ // FIXME: Currently cost of model modification for division by power of<br class="">+ // 2 is handled for X86 and AArch64. Add support for other targets.<br class=""> if (Op2VK == TargetTransformInfo::OK_UniformConstantValue &&<br class="">- CInt != cast<ConstantInt>(I->getOperand(1)))<br class="">- Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;<br class="">+ CInt->getValue().isPowerOf2())<br class="">+ Op2VP = TargetTransformInfo::OP_PowerOf2;<br class=""> }<br class="">- // FIXME: Currently cost of model modification for division by power of<br class="">- // 2 is handled for X86 and AArch64. Add support for other targets.<br class="">- if (Op2VK == TargetTransformInfo::OK_UniformConstantValue && CInt &&<br class="">- CInt->getValue().isPowerOf2())<br class="">- Op2VP = TargetTransformInfo::OP_PowerOf2;<br class=""><br class="">- SmallVector<const Value *, 4> Operands(VL0->operand_values());<br class="">- int ScalarCost =<br class="">- VecTy->getNumElements() *<br class="">- TTI->getArithmeticInstrCost(S.Opcode, ScalarTy, Op1VK, Op2VK, Op1VP,<br class="">- Op2VP, Operands);<br class="">- int VecCost = TTI->getArithmeticInstrCost(S.Opcode, VecTy, Op1VK, Op2VK,<br class="">- Op1VP, Op2VP, Operands);<br class="">+ int ScalarCost = VecTy->getNumElements() *<br class="">+ TTI->getArithmeticInstrCost(E->State.Opcode, ScalarTy,<br class="">+ Op1VK, Op2VK, Op1VP, Op2VP);<br class="">+ int VecCost = TTI->getArithmeticInstrCost(E->State.Opcode, VecTy, Op1VK,<br class="">+ Op2VK, Op1VP, Op2VP);<br class=""> return VecCost - ScalarCost;<br class=""> }<br class=""> case Instruction::GetElementPtr: {<br class="">@@ -2199,23 +2352,18 @@ int BoUpSLP::getEntryCost(TreeEntry *E)<br class=""> TargetTransformInfo::OK_AnyValue;<br class=""> TargetTransformInfo::OperandValueKind Op2VK =<br class=""> TargetTransformInfo::OK_AnyValue;<br class="">- int ScalarCost = 0;<br class="">- int VecCost = 0;<br class="">- for (Value *i : VL) {<br class="">- Instruction *I = cast<Instruction>(i);<br class="">- if (!I)<br class="">- break;<br class="">- ScalarCost +=<br class="">- TTI->getArithmeticInstrCost(I->getOpcode(), ScalarTy, Op1VK, Op2VK);<br class="">- }<br class="">+ unsigned AltOpcode = getAltOpcode(E->State.Opcode);<br class="">+ int ScalarCost =<br class="">+ TTI->getArithmeticInstrCost(E->State.Opcode, ScalarTy, Op1VK, Op2VK) *<br class="">+ VL.size() / 2;<br class="">+ ScalarCost +=<br class="">+ TTI->getArithmeticInstrCost(AltOpcode, ScalarTy, Op1VK, Op2VK) *<br class="">+ VL.size() / 2;<br class=""> // VecCost is equal to sum of the cost of creating 2 vectors<br class=""> // and the cost of creating shuffle.<br class="">- Instruction *I0 = cast<Instruction>(VL[0]);<br class="">- VecCost =<br class="">- TTI->getArithmeticInstrCost(I0->getOpcode(), VecTy, Op1VK, Op2VK);<br class="">- Instruction *I1 = cast<Instruction>(VL[1]);<br class="">- VecCost +=<br class="">- TTI->getArithmeticInstrCost(I1->getOpcode(), VecTy, Op1VK, Op2VK);<br class="">+ int VecCost =<br class="">+ TTI->getArithmeticInstrCost(E->State.Opcode, VecTy, Op1VK, Op2VK);<br class="">+ VecCost += TTI->getArithmeticInstrCost(AltOpcode, VecTy, Op1VK, Op2VK);<br class=""> VecCost +=<br class=""> TTI->getShuffleCost(TargetTransformInfo::SK_Alternate, VecTy, 0);<br class=""> return VecCost - ScalarCost;<br class="">@@ -2281,7 +2429,7 @@ int BoUpSLP::getSpillCost() {<br class=""> Instruction *PrevInst = nullptr;<br class=""><br class=""> for (const auto &N : VectorizableTree) {<br class="">- Instruction *Inst = dyn_cast<Instruction>(N.Scalars[0]);<br class="">+ Instruction *Inst = dyn_cast<Instruction>(N.State.OpValue);<br class=""> if (!Inst)<br class=""> continue;<br class=""><br class="">@@ -2341,7 +2489,7 @@ int BoUpSLP::getTreeCost() {<br class=""> for (TreeEntry &TE : VectorizableTree) {<br class=""> int C = getEntryCost(&TE);<br class=""> DEBUG(dbgs() << "SLP: Adding cost " << C << " for bundle that starts with "<br class="">- << *TE.Scalars[0] << ".\n");<br class="">+ << *TE.State.OpValue << ".\n");<br class=""> Cost += C;<br class=""> }<br class=""><br class="">@@ -2362,7 +2510,7 @@ int BoUpSLP::getTreeCost() {<br class=""> // extend the extracted value back to the original type. Here, we account<br class=""> // for the extract and the added cost of the sign extend if needed.<br class=""> auto *VecTy = VectorType::get(EU.Scalar->getType(), BundleWidth);<br class="">- auto *ScalarRoot = VectorizableTree[0].Scalars[0];<br class="">+ auto *ScalarRoot = VectorizableTree[0].State.OpValue;<br class=""> if (MinBWs.count(ScalarRoot)) {<br class=""> auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);<br class=""> auto Extend =<br class="">@@ -2425,13 +2573,15 @@ void BoUpSLP::reorderAltShuffleOperands(<br class=""> SmallVectorImpl<Value *> &Right) {<br class=""> // Push left and right operands of binary operation into Left and Right<br class=""> unsigned AltOpcode = getAltOpcode(Opcode);<br class="">- (void)AltOpcode;<br class=""> for (Value *V : VL) {<br class=""> auto *I = cast<Instruction>(V);<br class="">- assert(sameOpcodeOrAlt(Opcode, AltOpcode, I->getOpcode()) &&<br class="">- "Incorrect instruction in vector");<br class="">- Left.push_back(I->getOperand(0));<br class="">- Right.push_back(I->getOperand(1));<br class="">+ if (sameOpcodeOrAlt(Opcode, AltOpcode, I->getOpcode())) {<br class="">+ Left.push_back(I->getOperand(0));<br class="">+ Right.push_back(I->getOperand(1));<br class="">+ } else {<br class="">+ Left.push_back(I);<br class="">+ Right.push_back(getDefaultConstantForOpcode(Opcode, I->getType()));<br class="">+ }<br class=""> }<br class=""><br class=""> // Reorder if we have a commutative operation and consecutive access<br class="">@@ -2480,8 +2630,13 @@ static bool shouldReorderOperands(<br class=""> int i, unsigned Opcode, Instruction &I, ArrayRef<Value *> Left,<br class=""> ArrayRef<Value *> Right, bool AllSameOpcodeLeft, bool AllSameOpcodeRight,<br class=""> bool SplatLeft, bool SplatRight, Value *&VLeft, Value *&VRight) {<br class="">- VLeft = I.getOperand(0);<br class="">- VRight = I.getOperand(1);<br class="">+ if (I.getOpcode() == Opcode) {<br class="">+ VLeft = I.getOperand(0);<br class="">+ VRight = I.getOperand(1);<br class="">+ } else {<br class="">+ VLeft = &I;<br class="">+ VRight = getDefaultConstantForOpcode(Opcode, I.getType());<br class="">+ }<br class=""> // If we have "SplatRight", try to see if commuting is needed to preserve it.<br class=""> if (SplatRight) {<br class=""> if (VRight == Right[i - 1])<br class="">@@ -2545,8 +2700,15 @@ void BoUpSLP::reorderInputsAccordingToOp<br class=""> // Peel the first iteration out of the loop since there's nothing<br class=""> // interesting to do anyway and it simplifies the checks in the loop.<br class=""> auto *I = cast<Instruction>(VL[0]);<br class="">- Value *VLeft = I->getOperand(0);<br class="">- Value *VRight = I->getOperand(1);<br class="">+ Value *VLeft;<br class="">+ Value *VRight;<br class="">+ if (I->getOpcode() == Opcode) {<br class="">+ VLeft = I->getOperand(0);<br class="">+ VRight = I->getOperand(1);<br class="">+ } else {<br class="">+ VLeft = I;<br class="">+ VRight = getDefaultConstantForOpcode(Opcode, I->getType());<br class="">+ }<br class=""> if (!isa<Instruction>(VRight) && isa<Instruction>(VLeft))<br class=""> // Favor having instruction to the right. FIXME: why?<br class=""> std::swap(VLeft, VRight);<br class="">@@ -2751,12 +2913,11 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br class=""> IRBuilder<>::InsertPointGuard Guard(Builder);<br class=""><br class=""> if (E->VectorizedValue) {<br class="">- DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");<br class="">+ DEBUG(dbgs() << "SLP: Diamond merged for " << *E->State.OpValue << ".\n");<br class=""> return E->VectorizedValue;<br class=""> }<br class=""><br class="">- InstructionsState S = getSameOpcode(E->Scalars);<br class="">- Instruction *VL0 = cast<Instruction>(E->Scalars[0]);<br class="">+ Instruction *VL0 = cast<Instruction>(E->State.OpValue);<br class=""> Type *ScalarTy = VL0->getType();<br class=""> if (StoreInst *SI = dyn_cast<StoreInst>(VL0))<br class=""> ScalarTy = SI->getValueOperand()->getType();<br class="">@@ -2769,8 +2930,8 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br class=""> return V;<br class=""> }<br class=""><br class="">- unsigned ShuffleOrOp = S.IsAltShuffle ?<br class="">- (unsigned) Instruction::ShuffleVector : S.Opcode;<br class="">+ unsigned ShuffleOrOp = E->State.IsAltShuffle ?<br class="">+ (unsigned) Instruction::ShuffleVector : E->State.Opcode;<br class=""> switch (ShuffleOrOp) {<br class=""> case Instruction::PHI: {<br class=""> PHINode *PH = dyn_cast<PHINode>(VL0);<br class="">@@ -2880,7 +3041,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br class=""><br class=""> CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();<br class=""> Value *V;<br class="">- if (S.Opcode == Instruction::FCmp)<br class="">+ if (E->State.Opcode == Instruction::FCmp)<br class=""> V = Builder.CreateFCmp(P0, L, R);<br class=""> else<br class=""> V = Builder.CreateICmp(P0, L, R);<br class="">@@ -2932,13 +3093,19 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br class=""> case Instruction::Xor: {<br class=""> ValueList LHSVL, RHSVL;<br class=""> if (isa<BinaryOperator>(VL0) && VL0->isCommutative())<br class="">- reorderInputsAccordingToOpcode(S.Opcode, E->Scalars, LHSVL,<br class="">+ reorderInputsAccordingToOpcode(E->State.Opcode, E->Scalars, LHSVL,<br class=""> RHSVL);<br class=""> else<br class=""> for (Value *V : E->Scalars) {<br class=""> auto *I = cast<Instruction>(V);<br class="">- LHSVL.push_back(I->getOperand(0));<br class="">- RHSVL.push_back(I->getOperand(1));<br class="">+ if (I->getOpcode() == E->State.Opcode) {<br class="">+ LHSVL.push_back(I->getOperand(0));<br class="">+ RHSVL.push_back(I->getOperand(1));<br class="">+ } else {<br class="">+ LHSVL.push_back(V);<br class="">+ RHSVL.push_back(<br class="">+ getDefaultConstantForOpcode(E->State.Opcode, I->getType()));<br class="">+ }<br class=""> }<br class=""><br class=""> setInsertPointAfterBundle(E->Scalars, VL0);<br class="">@@ -2950,7 +3117,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br class=""> return V;<br class=""><br class=""> Value *V = Builder.CreateBinOp(<br class="">- static_cast<Instruction::BinaryOps>(S.Opcode), LHS, RHS);<br class="">+ static_cast<Instruction::BinaryOps>(E->State.Opcode), LHS, RHS);<br class=""> E->VectorizedValue = V;<br class=""> propagateIRFlags(E->VectorizedValue, E->Scalars, VL0);<br class=""> ++NumVectorInstructions;<br class="">@@ -3100,9 +3267,9 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br class=""> }<br class=""> case Instruction::ShuffleVector: {<br class=""> ValueList LHSVL, RHSVL;<br class="">- assert(Instruction::isBinaryOp(S.Opcode) &&<br class="">+ assert(Instruction::isBinaryOp(E->State.Opcode) &&<br class=""> "Invalid Shuffle Vector Operand");<br class="">- reorderAltShuffleOperands(S.Opcode, E->Scalars, LHSVL, RHSVL);<br class="">+ reorderAltShuffleOperands(E->State.Opcode, E->Scalars, LHSVL, RHSVL);<br class=""> setInsertPointAfterBundle(E->Scalars, VL0);<br class=""><br class=""> Value *LHS = vectorizeTree(LHSVL);<br class="">@@ -3113,9 +3280,9 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br class=""><br class=""> // Create a vector of LHS op1 RHS<br class=""> Value *V0 = Builder.CreateBinOp(<br class="">- static_cast<Instruction::BinaryOps>(S.Opcode), LHS, RHS);<br class="">+ static_cast<Instruction::BinaryOps>(E->State.Opcode), LHS, RHS);<br class=""><br class="">- unsigned AltOpcode = getAltOpcode(S.Opcode);<br class="">+ unsigned AltOpcode = getAltOpcode(E->State.Opcode);<br class=""> // Create a vector of LHS op2 RHS<br class=""> Value *V1 = Builder.CreateBinOp(<br class=""> static_cast<Instruction::BinaryOps>(AltOpcode), LHS, RHS);<br class="">@@ -3137,8 +3304,13 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br class=""> }<br class=""><br class=""> Value *ShuffleMask = ConstantVector::get(Mask);<br class="">- propagateIRFlags(V0, EvenScalars);<br class="">- propagateIRFlags(V1, OddScalars);<br class="">+ InstructionsState S = getSameOpcode(EvenScalars);<br class="">+ assert(!S.IsAltShuffle && "Unexpected alternate opcode");<br class="">+ propagateIRFlags(V0, EvenScalars, S.OpValue);<br class="">+<br class="">+ S = getSameOpcode(OddScalars);<br class="">+ assert(!S.IsAltShuffle && "Unexpected alternate opcode");<br class="">+ propagateIRFlags(V1, OddScalars, S.OpValue);<br class=""><br class=""> Value *V = Builder.CreateShuffleVector(V0, V1, ShuffleMask);<br class=""> E->VectorizedValue = V;<br class="">@@ -3172,7 +3344,7 @@ BoUpSLP::vectorizeTree(ExtraValueToDebug<br class=""> // If the vectorized tree can be rewritten in a smaller type, we truncate the<br class=""> // vectorized root. InstCombine will then rewrite the entire expression. We<br class=""> // sign extend the extracted values below.<br class="">- auto *ScalarRoot = VectorizableTree[0].Scalars[0];<br class="">+ auto *ScalarRoot = VectorizableTree[0].State.OpValue;<br class=""> if (MinBWs.count(ScalarRoot)) {<br class=""> if (auto *I = dyn_cast<Instruction>(VectorRoot))<br class=""> Builder.SetInsertPoint(&*++BasicBlock::iterator(I));<br class="">@@ -3283,9 +3455,15 @@ BoUpSLP::vectorizeTree(ExtraValueToDebug<br class=""> assert(Entry->VectorizedValue && "Can't find vectorizable value");<br class=""><br class=""> // For each lane:<br class="">+ const unsigned Opcode = Entry->State.Opcode;<br class="">+ const unsigned AltOpcode = getAltOpcode(Opcode);<br class=""> for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {<br class=""> Value *Scalar = Entry->Scalars[Lane];<br class=""><br class="">+ if (!sameOpcodeOrAlt(Opcode, AltOpcode,<br class="">+ cast<Instruction>(Scalar)->getOpcode()))<br class="">+ continue;<br class="">+<br class=""> Type *Ty = Scalar->getType();<br class=""> if (!Ty->isVoidTy()) {<br class=""> #ifndef NDEBUG<br class="">@@ -3417,7 +3595,7 @@ bool BoUpSLP::BlockScheduling::trySchedu<br class=""> }<br class=""><br class=""> for (Value *V : VL) {<br class="">- ScheduleData *BundleMember = getScheduleData(V);<br class="">+ ScheduleData *BundleMember = getScheduleData(V, isOneOf(OpValue, V));<br class=""> assert(BundleMember &&<br class=""> "no ScheduleData for bundle member (maybe not in same basic block)");<br class=""> if (BundleMember->IsScheduled) {<br class="">@@ -3490,7 +3668,7 @@ void BoUpSLP::BlockScheduling::cancelSch<br class=""> if (isa<PHINode>(OpValue))<br class=""> return;<br class=""><br class="">- ScheduleData *Bundle = getScheduleData(OpValue);<br class="">+ ScheduleData *Bundle = getScheduleData(OpValue)->FirstInBundle;<br class=""> DEBUG(dbgs() << "SLP: cancel scheduling of " << *Bundle << "\n");<br class=""> assert(!Bundle->IsScheduled &&<br class=""> "Can't cancel bundle which is already scheduled");<br class="">@@ -3793,7 +3971,7 @@ void BoUpSLP::scheduleBlock(BlockSchedul<br class=""> I = I->getNextNode()) {<br class=""> BS->doForAllOpcodes(I, [this, &Idx, &NumToSchedule, BS](ScheduleData *SD) {<br class=""> assert(SD->isPartOfBundle() ==<br class="">- (getTreeEntry(SD->Inst) != nullptr) &&<br class="">+ (getTreeEntry(SD->Inst, SD->OpValue) != nullptr) &&<br class=""> "scheduler and vectorizer bundle mismatch");<br class=""> SD->FirstInBundle->SchedulingPriority = Idx++;<br class=""> if (SD->isSchedulingEntity()) {<br class="">@@ -3816,12 +3994,13 @@ void BoUpSLP::scheduleBlock(BlockSchedul<br class=""> ScheduleData *BundleMember = picked;<br class=""> while (BundleMember) {<br class=""> Instruction *pickedInst = BundleMember->Inst;<br class="">- if (LastScheduledInst->getNextNode() != pickedInst) {<br class="">- BS->BB->getInstList().remove(pickedInst);<br class="">- BS->BB->getInstList().insert(LastScheduledInst->getIterator(),<br class="">- pickedInst);<br class="">+ if (pickedInst == BundleMember->OpValue) {<br class="">+ if (LastScheduledInst->getNextNode() != pickedInst) {<br class="">+ BS->BB->getInstList().remove(pickedInst);<br class="">+ BS->BB->getInstList().insert(LastScheduledInst->getIterator(), pickedInst);<br class="">+ }<br class="">+ LastScheduledInst = pickedInst;<br class=""> }<br class="">- LastScheduledInst = pickedInst;<br class=""> BundleMember = BundleMember->NextInBundle;<br class=""> }<br class=""><br class=""><br class="">Added: llvm/trunk/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll?rev=317618&view=auto" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll?rev=317618&view=auto</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll (added)<br class="">+++ llvm/trunk/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll Tue Nov 7 13:25:34 2017<br class="">@@ -0,0 +1,52 @@<br class="">+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py<br class="">+; RUN: opt -mtriple=systemz-unknown -mcpu=z13 -slp-vectorizer -S < %s | FileCheck %s<br class="">+<br class="">+@bar = external global [4 x [4 x i32]], align 4<br class="">+@dct_luma = external global [4 x [4 x i32]], align 4<br class="">+<br class="">+define void @foo() local_unnamed_addr {<br class="">+; CHECK-LABEL: @foo(<br class="">+; CHECK-NEXT: entry:<br class="">+; CHECK-NEXT: [[ADD277:%.*]] = add nsw i32 undef, undef<br class="">+; CHECK-NEXT: store i32 [[ADD277]], i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4<br class="">+; CHECK-NEXT: [[ARRAYIDX372:%.*]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* @dct_luma, i64 0, i64 3, i64 0<br class="">+; CHECK-NEXT: [[ARRAYIDX372_1:%.*]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* @dct_luma, i64 0, i64 3, i64 1<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 2), align 4<br class="">+; CHECK-NEXT: [[ARRAYIDX372_2:%.*]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* @dct_luma, i64 0, i64 3, i64 2<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 3), align 4<br class="">+; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> undef, i32 [[TMP0]], i32 0<br class="">+; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[ADD277]], i32 1<br class="">+; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP1]], i32 2<br class="">+; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[TMP2]], i32 3<br class="">+; CHECK-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> undef, [[TMP6]]<br class="">+; CHECK-NEXT: [[TMP8:%.*]] = ashr <4 x i32> [[TMP7]], <i32 6, i32 6, i32 6, i32 6><br class="">+; CHECK-NEXT: [[ARRAYIDX372_3:%.*]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* @dct_luma, i64 0, i64 3, i64 3<br class="">+; CHECK-NEXT: [[TMP9:%.*]] = bitcast i32* [[ARRAYIDX372]] to <4 x i32>*<br class="">+; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP9]], align 4<br class="">+; CHECK-NEXT: unreachable<br class="">+;<br class="">+entry:<br class="">+ %add277 = add nsw i32 undef, undef<br class="">+ store i32 %add277, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4<br class="">+ %0 = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4<br class="">+ %sub355 = add nsw i32 undef, %0<br class="">+ %shr.i = ashr i32 %sub355, 6<br class="">+ %arrayidx372 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* @dct_luma, i64 0, i64 3, i64 0<br class="">+ store i32 %shr.i, i32* %arrayidx372, align 4<br class="">+ %sub355.1 = add nsw i32 undef, %add277<br class="">+ %shr.i.1 = ashr i32 %sub355.1, 6<br class="">+ %arrayidx372.1 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* @dct_luma, i64 0, i64 3, i64 1<br class="">+ store i32 %shr.i.1, i32* %arrayidx372.1, align 4<br class="">+ %1 = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 2), align 4<br class="">+ %sub355.2 = add nsw i32 undef, %1<br class="">+ %shr.i.2 = ashr i32 %sub355.2, 6<br class="">+ %arrayidx372.2 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* @dct_luma, i64 0, i64 3, i64 2<br class="">+ store i32 %shr.i.2, i32* %arrayidx372.2, align 4<br class="">+ %2 = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 3), align 4<br class="">+ %sub355.3 = add nsw i32 undef, %2<br class="">+ %shr.i.3 = ashr i32 %sub355.3, 6<br class="">+ %arrayidx372.3 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* @dct_luma, i64 0, i64 3, i64 3<br class="">+ store i32 %shr.i.3, i32* %arrayidx372.3, align 4<br class="">+ unreachable<br class="">+}<br class=""><br class="">Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll?rev=317618&r1=317617&r2=317618&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll?rev=317618&r1=317617&r2=317618&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll (original)<br class="">+++ llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll Tue Nov 7 13:25:34 2017<br class="">@@ -43,22 +43,16 @@ define void @add1(i32* noalias %dst, i32<br class=""> ; CHECK-LABEL: @add1(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, i32* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* [[SRC]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4<br class="">-; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[TMP1]], 1<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2<br class="">-; CHECK-NEXT: store i32 [[ADD3]], i32* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR5:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4<br class="">-; CHECK-NEXT: [[ADD6:%.*]] = add nsw i32 [[TMP2]], 2<br class=""> ; CHECK-NEXT: [[INCDEC_PTR7:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3<br class="">-; CHECK-NEXT: store i32 [[ADD6]], i32* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR5]], align 4<br class="">-; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[TMP3]], 3<br class="">-; CHECK-NEXT: store i32 [[ADD9]], i32* [[INCDEC_PTR7]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[SRC]] to <4 x i32>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> <i32 0, i32 1, i32 2, i32 3>, [[TMP1]]<br class="">+; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32* [[DST]] to <4 x i32>*<br class="">+; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -86,22 +80,16 @@ define void @sub0(i32* noalias %dst, i32<br class=""> ; CHECK-LABEL: @sub0(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, i32* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* [[SRC]], align 4<br class="">-; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR3:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2<br class="">-; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4<br class="">-; CHECK-NEXT: [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2<br class=""> ; CHECK-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3<br class="">-; CHECK-NEXT: store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[SUB8:%.*]] = add nsw i32 [[TMP3]], -3<br class="">-; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[SRC]] to <4 x i32>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> <i32 -1, i32 0, i32 -2, i32 -3>, [[TMP1]]<br class="">+; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32* [[DST]] to <4 x i32>*<br class="">+; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -205,22 +193,18 @@ define void @addsub0(i32* noalias %dst,<br class=""> ; CHECK-LABEL: @addsub0(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, i32* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* [[SRC]], align 4<br class="">-; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR3:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2<br class="">-; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4<br class="">-; CHECK-NEXT: [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2<br class=""> ; CHECK-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3<br class="">-; CHECK-NEXT: store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3<br class="">-; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[SRC]] to <4 x i32>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> [[TMP1]], <i32 -1, i32 0, i32 -2, i32 -3><br class="">+; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <4 x i32> [[TMP1]], <i32 -1, i32 0, i32 -2, i32 -3><br class="">+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> <i32 0, i32 5, i32 2, i32 7><br class="">+; CHECK-NEXT: [[TMP5:%.*]] = bitcast i32* [[DST]] to <4 x i32>*<br class="">+; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -248,22 +232,18 @@ define void @addsub1(i32* noalias %dst,<br class=""> ; CHECK-LABEL: @addsub1(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, i32* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* [[SRC]], align 4<br class="">-; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4<br class="">-; CHECK-NEXT: [[SUB1:%.*]] = sub nsw i32 [[TMP1]], -1<br class=""> ; CHECK-NEXT: [[INCDEC_PTR3:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2<br class="">-; CHECK-NEXT: store i32 [[SUB1]], i32* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3<br class="">-; CHECK-NEXT: store i32 [[TMP2]], i32* [[INCDEC_PTR3]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3<br class="">-; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[SRC]] to <4 x i32>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> [[TMP1]], <i32 -1, i32 -1, i32 0, i32 -3><br class="">+; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <4 x i32> [[TMP1]], <i32 -1, i32 -1, i32 0, i32 -3><br class="">+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> <i32 0, i32 5, i32 2, i32 7><br class="">+; CHECK-NEXT: [[TMP5:%.*]] = bitcast i32* [[DST]] to <4 x i32>*<br class="">+; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -291,22 +271,16 @@ define void @mul(i32* noalias %dst, i32*<br class=""> ; CHECK-LABEL: @mul(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, i32* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* [[SRC]], align 4<br class="">-; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 257<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store i32 [[MUL]], i32* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4<br class="">-; CHECK-NEXT: [[MUL3:%.*]] = mul nsw i32 [[TMP1]], -3<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2<br class="">-; CHECK-NEXT: store i32 [[MUL3]], i32* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR5:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR7:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3<br class="">-; CHECK-NEXT: store i32 [[TMP2]], i32* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR5]], align 4<br class="">-; CHECK-NEXT: [[MUL9:%.*]] = mul nsw i32 [[TMP3]], -9<br class="">-; CHECK-NEXT: store i32 [[MUL9]], i32* [[INCDEC_PTR7]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[SRC]] to <4 x i32>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = mul nsw <4 x i32> <i32 257, i32 -3, i32 1, i32 -9>, [[TMP1]]<br class="">+; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32* [[DST]] to <4 x i32>*<br class="">+; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -334,22 +308,16 @@ define void @shl0(i32* noalias %dst, i32<br class=""> ; CHECK-LABEL: @shl0(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, i32* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* [[SRC]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4<br class="">-; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[TMP1]], 1<br class=""> ; CHECK-NEXT: [[INCDEC_PTR3:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2<br class="">-; CHECK-NEXT: store i32 [[SHL]], i32* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4<br class="">-; CHECK-NEXT: [[SHL5:%.*]] = shl i32 [[TMP2]], 2<br class=""> ; CHECK-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3<br class="">-; CHECK-NEXT: store i32 [[SHL5]], i32* [[INCDEC_PTR3]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[SHL8:%.*]] = shl i32 [[TMP3]], 3<br class="">-; CHECK-NEXT: store i32 [[SHL8]], i32* [[INCDEC_PTR6]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[SRC]] to <4 x i32>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = shl <4 x i32> [[TMP1]], <i32 0, i32 1, i32 2, i32 3><br class="">+; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32* [[DST]] to <4 x i32>*<br class="">+; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -453,22 +421,16 @@ define void @add1f(float* noalias %dst,<br class=""> ; CHECK-LABEL: @add1f(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load float, float* [[SRC]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4<br class="">-; CHECK-NEXT: [[ADD3:%.*]] = fadd fast float [[TMP1]], 1.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2<br class="">-; CHECK-NEXT: store float [[ADD3]], float* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4<br class="">-; CHECK-NEXT: [[ADD6:%.*]] = fadd fast float [[TMP2]], 2.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3<br class="">-; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4<br class="">-; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP3]], 3.000000e+00<br class="">-; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast float* [[SRC]] to <4 x float>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> <float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>, [[TMP1]]<br class="">+; CHECK-NEXT: [[TMP3:%.*]] = bitcast float* [[DST]] to <4 x float>*<br class="">+; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -496,22 +458,16 @@ define void @sub0f(float* noalias %dst,<br class=""> ; CHECK-LABEL: @sub0f(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load float, float* [[SRC]], align 4<br class="">-; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2<br class="">-; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4<br class="">-; CHECK-NEXT: [[ADD6:%.*]] = fadd fast float [[TMP2]], -2.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3<br class="">-; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4<br class="">-; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP3]], -3.000000e+00<br class="">-; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast float* [[SRC]] to <4 x float>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> <float -1.000000e+00, float 0.000000e+00, float -2.000000e+00, float -3.000000e+00>, [[TMP1]]<br class="">+; CHECK-NEXT: [[TMP3:%.*]] = bitcast float* [[DST]] to <4 x float>*<br class="">+; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -615,22 +571,18 @@ define void @addsub0f(float* noalias %ds<br class=""> ; CHECK-LABEL: @addsub0f(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load float, float* [[SRC]], align 4<br class="">-; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR3:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2<br class="">-; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4<br class="">-; CHECK-NEXT: [[SUB5:%.*]] = fadd fast float [[TMP2]], -2.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3<br class="">-; CHECK-NEXT: store float [[SUB5]], float* [[INCDEC_PTR3]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load float, float* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00<br class="">-; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast float* [[SRC]] to <4 x float>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> [[TMP1]], <float -1.000000e+00, float 0.000000e+00, float -2.000000e+00, float -3.000000e+00><br class="">+; CHECK-NEXT: [[TMP3:%.*]] = fsub fast <4 x float> [[TMP1]], <float -1.000000e+00, float 0.000000e+00, float -2.000000e+00, float -3.000000e+00><br class="">+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 5, i32 2, i32 7><br class="">+; CHECK-NEXT: [[TMP5:%.*]] = bitcast float* [[DST]] to <4 x float>*<br class="">+; CHECK-NEXT: store <4 x float> [[TMP4]], <4 x float>* [[TMP5]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -658,22 +610,18 @@ define void @addsub1f(float* noalias %ds<br class=""> ; CHECK-LABEL: @addsub1f(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load float, float* [[SRC]], align 4<br class="">-; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4<br class="">-; CHECK-NEXT: [[SUB1:%.*]] = fsub fast float [[TMP1]], -1.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR3:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2<br class="">-; CHECK-NEXT: store float [[SUB1]], float* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3<br class="">-; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR3]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load float, float* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00<br class="">-; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast float* [[SRC]] to <4 x float>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00, float 0.000000e+00, float -3.000000e+00><br class="">+; CHECK-NEXT: [[TMP3:%.*]] = fsub fast <4 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00, float 0.000000e+00, float -3.000000e+00><br class="">+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 5, i32 2, i32 7><br class="">+; CHECK-NEXT: [[TMP5:%.*]] = bitcast float* [[DST]] to <4 x float>*<br class="">+; CHECK-NEXT: store <4 x float> [[TMP4]], <4 x float>* [[TMP5]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -701,22 +649,16 @@ define void @mulf(float* noalias %dst, f<br class=""> ; CHECK-LABEL: @mulf(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load float, float* [[SRC]], align 4<br class="">-; CHECK-NEXT: [[SUB:%.*]] = fmul fast float [[TMP0]], 2.570000e+02<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4<br class="">-; CHECK-NEXT: [[SUB3:%.*]] = fmul fast float [[TMP1]], -3.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2<br class="">-; CHECK-NEXT: store float [[SUB3]], float* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3<br class="">-; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4<br class="">-; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP3]], -9.000000e+00<br class="">-; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast float* [[SRC]] to <4 x float>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> <float 2.570000e+02, float -3.000000e+00, float 1.000000e+00, float -9.000000e+00>, [[TMP1]]<br class="">+; CHECK-NEXT: [[TMP3:%.*]] = bitcast float* [[DST]] to <4 x float>*<br class="">+; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class="">@@ -825,22 +767,16 @@ define void @sub0fn(float* noalias %dst,<br class=""> ; CHECK-LABEL: @sub0fn(<br class=""> ; CHECK-NEXT: entry:<br class=""> ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 1<br class="">-; CHECK-NEXT: [[TMP0:%.*]] = load float, float* [[SRC]], align 4<br class="">-; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1<br class="">-; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2<br class="">-; CHECK-NEXT: [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2<br class="">-; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4<br class=""> ; CHECK-NEXT: [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3<br class="">-; CHECK-NEXT: [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4<br class="">-; CHECK-NEXT: [[ADD6:%.*]] = fadd float [[TMP2]], -2.000000e+00<br class=""> ; CHECK-NEXT: [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3<br class="">-; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4<br class="">-; CHECK-NEXT: [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4<br class="">-; CHECK-NEXT: [[ADD9:%.*]] = fadd float [[TMP3]], -3.000000e+00<br class="">-; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4<br class="">+; CHECK-NEXT: [[TMP0:%.*]] = bitcast float* [[SRC]] to <4 x float>*<br class="">+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* [[TMP0]], align 4<br class="">+; CHECK-NEXT: [[TMP2:%.*]] = fadd <4 x float> <float -1.000000e+00, float 0.000000e+00, float -2.000000e+00, float -3.000000e+00>, [[TMP1]]<br class="">+; CHECK-NEXT: [[TMP3:%.*]] = bitcast float* [[DST]] to <4 x float>*<br class="">+; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4<br class=""> ; CHECK-NEXT: ret void<br class=""> ;<br class=""> entry:<br class=""><br class=""><br class="">_______________________________________________<br class="">llvm-commits mailing list<br class=""><a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a><br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits<br class=""></div></div></blockquote></div><br class=""></body></html>