<div dir="ltr">I'm seeing miscompiles with AVX512 after this change (wrong floating point results). Any ideas? I'll try getting a small test case.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 2, 2019 at 10:20 PM Vasileios Porpodas via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Author: vporpo<br>

Date: Tue Jul  2 13:20:28 2019<br>

New Revision: 364964<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=364964&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=364964&view=rev</a><br>

Log:<br>

[SLP] Recommit: Look-ahead operand reordering heuristic.<br>

<br>

Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).<br>

<br>

Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk<br>

<br>

Reviewed By: RKSimon, dtemirbulatov<br>

<br>

Subscribers: hiraditya, phosek, rnk, rcorcs, llvm-commits<br>

<br>

Tags: #llvm<br>

<br>

Differential Revision: <a href="https://reviews.llvm.org/D60897" rel="noreferrer" target="_blank">https://reviews.llvm.org/D60897</a><br>

<br>

Modified:<br>

    llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp<br>

    llvm/trunk/test/Transforms/SLPVectorizer/X86/lookahead.ll<br>

<br>

Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=364964&r1=364963&r2=364964&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=364964&r1=364963&r2=364964&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)<br>

+++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Tue Jul  2 13:20:28 2019<br>

@@ -147,6 +147,20 @@ static cl::opt<unsigned> MinTreeSize(<br>

     "slp-min-tree-size", cl::init(3), cl::Hidden,<br>

     cl::desc("Only vectorize small trees if they are fully vectorizable"));<br>

<br>

+// The maximum depth that the look-ahead score heuristic will explore.<br>

+// The higher this value, the higher the compilation time overhead.<br>

+static cl::opt<int> LookAheadMaxDepth(<br>

+    "slp-max-look-ahead-depth", cl::init(2), cl::Hidden,<br>

+    cl::desc("The maximum look-ahead depth for operand reordering scores"));<br>

+<br>

+// The Look-ahead heuristic goes through the users of the bundle to calculate<br>

+// the users cost in getExternalUsesCost(). To avoid compilation time increase<br>

+// we limit the number of users visited to this value.<br>

+static cl::opt<unsigned> LookAheadUsersBudget(<br>

+    "slp-look-ahead-users-budget", cl::init(2), cl::Hidden,<br>

+    cl::desc("The maximum number of users to visit while visiting the "<br>

+             "predecessors. This prevents compilation time increase."));<br>

+<br>

 static cl::opt<bool><br>

     ViewSLPTree("view-slp-tree", cl::Hidden,<br>

                 cl::desc("Display the SLP trees with Graphviz"));<br>

@@ -708,6 +722,7 @@ public:<br>

<br>

     const DataLayout &DL;<br>

     ScalarEvolution &SE;<br>

+    const BoUpSLP &R;<br>

<br>

     /// \returns the operand data at \p OpIdx and \p Lane.<br>

     OperandData &getData(unsigned OpIdx, unsigned Lane) {<br>

@@ -733,6 +748,215 @@ public:<br>

       std::swap(OpsVec[OpIdx1][Lane], OpsVec[OpIdx2][Lane]);<br>

     }<br>

<br>

+    // The hard-coded scores listed here are not very important. When computing<br>

+    // the scores of matching one sub-tree with another, we are basically<br>

+    // counting the number of values that are matching. So even if all scores<br>

+    // are set to 1, we would still get a decent matching result.<br>

+    // However, sometimes we have to break ties. For example we may have to<br>

+    // choose between matching loads vs matching opcodes. This is what these<br>

+    // scores are helping us with: they provide the order of preference.<br>

+<br>

+    /// Loads from consecutive memory addresses, e.g. load(A[i]), load(A[i+1]).<br>

+    static const int ScoreConsecutiveLoads = 3;<br>

+    /// Constants.<br>

+    static const int ScoreConstants = 2;<br>

+    /// Instructions with the same opcode.<br>

+    static const int ScoreSameOpcode = 2;<br>

+    /// Instructions with alt opcodes (e.g, add + sub).<br>

+    static const int ScoreAltOpcodes = 1;<br>

+    /// Identical instructions (a.k.a. splat or broadcast).<br>

+    static const int ScoreSplat = 1;<br>

+    /// Matching with an undef is preferable to failing.<br>

+    static const int ScoreUndef = 1;<br>

+    /// Score for failing to find a decent match.<br>

+    static const int ScoreFail = 0;<br>

+    /// User exteranl to the vectorized code.<br>

+    static const int ExternalUseCost = 1;<br>

+    /// The user is internal but in a different lane.<br>

+    static const int UserInDiffLaneCost = ExternalUseCost;<br>

+<br>

+    /// \returns the score of placing \p V1 and \p V2 in consecutive lanes.<br>

+    static int getShallowScore(Value *V1, Value *V2, const DataLayout &DL,<br>

+                               ScalarEvolution &SE) {<br>

+      auto *LI1 = dyn_cast<LoadInst>(V1);<br>

+      auto *LI2 = dyn_cast<LoadInst>(V2);<br>

+      if (LI1 && LI2)<br>

+        return isConsecutiveAccess(LI1, LI2, DL, SE)<br>

+                   ? VLOperands::ScoreConsecutiveLoads<br>

+                   : VLOperands::ScoreFail;<br>

+<br>

+      auto *C1 = dyn_cast<Constant>(V1);<br>

+      auto *C2 = dyn_cast<Constant>(V2);<br>

+      if (C1 && C2)<br>

+        return VLOperands::ScoreConstants;<br>

+<br>

+      auto *I1 = dyn_cast<Instruction>(V1);<br>

+      auto *I2 = dyn_cast<Instruction>(V2);<br>

+      if (I1 && I2) {<br>

+        if (I1 == I2)<br>

+          return VLOperands::ScoreSplat;<br>

+        InstructionsState S = getSameOpcode({I1, I2});<br>

+        // Note: Only consider instructions with <= 2 operands to avoid<br>

+        // complexity explosion.<br>

+        if (S.getOpcode() && S.MainOp->getNumOperands() <= 2)<br>

+          return S.isAltShuffle() ? VLOperands::ScoreAltOpcodes<br>

+                                  : VLOperands::ScoreSameOpcode;<br>

+      }<br>

+<br>

+      if (isa<UndefValue>(V2))<br>

+        return VLOperands::ScoreUndef;<br>

+<br>

+      return VLOperands::ScoreFail;<br>

+    }<br>

+<br>

+    /// Holds the values and their lane that are taking part in the look-ahead<br>

+    /// score calculation. This is used in the external uses cost calculation.<br>

+    SmallDenseMap<Value *, int> InLookAheadValues;<br>

+<br>

+    /// \Returns the additinal cost due to uses of \p LHS and \p RHS that are<br>

+    /// either external to the vectorized code, or require shuffling.<br>

+    int getExternalUsesCost(const std::pair<Value *, int> &LHS,<br>

+                            const std::pair<Value *, int> &RHS) {<br>

+      int Cost = 0;<br>

+      SmallVector<std::pair<Value *, int>, 2> Values = {LHS, RHS};<br>

+      for (int Idx = 0, IdxE = Values.size(); Idx != IdxE; ++Idx) {<br>

+        Value *V = Values[Idx].first;<br>

+        // Calculate the absolute lane, using the minimum relative lane of LHS<br>

+        // and RHS as base and Idx as the offset.<br>

+        int Ln = std::min(LHS.second, RHS.second) + Idx;<br>

+        assert(Ln >= 0 && "Bad lane calculation");<br>

+        unsigned UsersBudget = LookAheadUsersBudget;<br>

+        for (User *U : V->users()) {<br>

+          if (const TreeEntry *UserTE = R.getTreeEntry(U)) {<br>

+            // The user is in the VectorizableTree. Check if we need to insert.<br>

+            auto It = llvm::find(UserTE->Scalars, U);<br>

+            assert(It != UserTE->Scalars.end() && "U is in UserTE");<br>

+            int UserLn = std::distance(UserTE->Scalars.begin(), It);<br>

+            assert(UserLn >= 0 && "Bad lane");<br>

+            if (UserLn != Ln)<br>

+              Cost += UserInDiffLaneCost;<br>

+          } else {<br>

+            // Check if the user is in the look-ahead code.<br>

+            auto It2 = InLookAheadValues.find(U);<br>

+            if (It2 != InLookAheadValues.end()) {<br>

+              // The user is in the look-ahead code. Check the lane.<br>

+              if (It2->second != Ln)<br>

+                Cost += UserInDiffLaneCost;<br>

+            } else {<br>

+              // The user is neither in SLP tree nor in the look-ahead code.<br>

+              Cost += ExternalUseCost;<br>

+            }<br>

+          }<br>

+          // Limit the number of visited uses to cap compilation time.<br>

+          if (--UsersBudget == 0)<br>

+            break;<br>

+        }<br>

+      }<br>

+      return Cost;<br>

+    }<br>

+<br>

+    /// Go through the operands of \p LHS and \p RHS recursively until \p<br>

+    /// MaxLevel, and return the cummulative score. For example:<br>

+    /// \verbatim<br>

+    ///  A[0]  B[0]  A[1]  B[1]  C[0] D[0]  B[1] A[1]<br>

+    ///     \ /         \ /         \ /        \ /<br>

+    ///      +           +           +          +<br>

+    ///     G1          G2          G3         G4<br>

+    /// \endverbatim<br>

+    /// The getScoreAtLevelRec(G1, G2) function will try to match the nodes at<br>

+    /// each level recursively, accumulating the score. It starts from matching<br>

+    /// the additions at level 0, then moves on to the loads (level 1). The<br>

+    /// score of G1 and G2 is higher than G1 and G3, because {A[0],A[1]} and<br>

+    /// {B[0],B[1]} match with VLOperands::ScoreConsecutiveLoads, while<br>

+    /// {A[0],C[0]} has a score of VLOperands::ScoreFail.<br>

+    /// Please note that the order of the operands does not matter, as we<br>

+    /// evaluate the score of all profitable combinations of operands. In<br>

+    /// other words the score of G1 and G4 is the same as G1 and G2. This<br>

+    /// heuristic is based on ideas described in:<br>

+    ///   Look-ahead SLP: Auto-vectorization in the presence of commutative<br>

+    ///   operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha,<br>

+    ///   LuÃs F. W. GÃ³es<br>

+    int getScoreAtLevelRec(const std::pair<Value *, int> &LHS,<br>

+                           const std::pair<Value *, int> &RHS, int CurrLevel,<br>

+                           int MaxLevel) {<br>

+<br>

+      Value *V1 = LHS.first;<br>

+      Value *V2 = RHS.first;<br>

+      // Get the shallow score of V1 and V2.<br>

+      int ShallowScoreAtThisLevel =<br>

+          std::max((int)ScoreFail, getShallowScore(V1, V2, DL, SE) -<br>

+                                       getExternalUsesCost(LHS, RHS));<br>

+      int Lane1 = LHS.second;<br>

+      int Lane2 = RHS.second;<br>

+<br>

+      // If reached MaxLevel,<br>

+      //  or if V1 and V2 are not instructions,<br>

+      //  or if they are SPLAT,<br>

+      //  or if they are not consecutive, early return the current cost.<br>

+      auto *I1 = dyn_cast<Instruction>(V1);<br>

+      auto *I2 = dyn_cast<Instruction>(V2);<br>

+      if (CurrLevel == MaxLevel || !(I1 && I2) || I1 == I2 ||<br>

+          ShallowScoreAtThisLevel == VLOperands::ScoreFail ||<br>

+          (isa<LoadInst>(I1) && isa<LoadInst>(I2) && ShallowScoreAtThisLevel))<br>

+        return ShallowScoreAtThisLevel;<br>

+      assert(I1 && I2 && "Should have early exited.");<br>

+<br>

+      // Keep track of in-tree values for determining the external-use cost.<br>

+      InLookAheadValues[V1] = Lane1;<br>

+      InLookAheadValues[V2] = Lane2;<br>

+<br>

+      // Contains the I2 operand indexes that got matched with I1 operands.<br>

+      SmallSet<unsigned, 4> Op2Used;<br>

+<br>

+      // Recursion towards the operands of I1 and I2. We are trying all possbile<br>

+      // operand pairs, and keeping track of the best score.<br>

+      for (unsigned OpIdx1 = 0, NumOperands1 = I1->getNumOperands();<br>

+           OpIdx1 != NumOperands1; ++OpIdx1) {<br>

+        // Try to pair op1I with the best operand of I2.<br>

+        int MaxTmpScore = 0;<br>

+        unsigned MaxOpIdx2 = 0;<br>

+        bool FoundBest = false;<br>

+        // If I2 is commutative try all combinations.<br>

+        unsigned FromIdx = isCommutative(I2) ? 0 : OpIdx1;<br>

+        unsigned ToIdx = isCommutative(I2)<br>

+                             ? I2->getNumOperands()<br>

+                             : std::min(I2->getNumOperands(), OpIdx1 + 1);<br>

+        assert(FromIdx <= ToIdx && "Bad index");<br>

+        for (unsigned OpIdx2 = FromIdx; OpIdx2 != ToIdx; ++OpIdx2) {<br>

+          // Skip operands already paired with OpIdx1.<br>

+          if (Op2Used.count(OpIdx2))<br>

+            continue;<br>

+          // Recursively calculate the cost at each level<br>

+          int TmpScore = getScoreAtLevelRec({I1->getOperand(OpIdx1), Lane1},<br>

+                                            {I2->getOperand(OpIdx2), Lane2},<br>

+                                            CurrLevel + 1, MaxLevel);<br>

+          // Look for the best score.<br>

+          if (TmpScore > VLOperands::ScoreFail && TmpScore > MaxTmpScore) {<br>

+            MaxTmpScore = TmpScore;<br>

+            MaxOpIdx2 = OpIdx2;<br>

+            FoundBest = true;<br>

+          }<br>

+        }<br>

+        if (FoundBest) {<br>

+          // Pair {OpIdx1, MaxOpIdx2} was found to be best. Never revisit it.<br>

+          Op2Used.insert(MaxOpIdx2);<br>

+          ShallowScoreAtThisLevel += MaxTmpScore;<br>

+        }<br>

+      }<br>

+      return ShallowScoreAtThisLevel;<br>

+    }<br>

+<br>

+    /// \Returns the look-ahead score, which tells us how much the sub-trees<br>

+    /// rooted at \p LHS and \p RHS match, the more they match the higher the<br>

+    /// score. This helps break ties in an informed way when we cannot decide on<br>

+    /// the order of the operands by just considering the immediate<br>

+    /// predecessors.<br>

+    int getLookAheadScore(const std::pair<Value *, int> &LHS,<br>

+                          const std::pair<Value *, int> &RHS) {<br>

+      InLookAheadValues.clear();<br>

+      return getScoreAtLevelRec(LHS, RHS, 1, LookAheadMaxDepth);<br>

+    }<br>

+<br>

     // Search all operands in Ops[*][Lane] for the one that matches best<br>

     // Ops[OpIdx][LastLane] and return its opreand index.<br>

     // If no good match can be found, return None.<br>

@@ -750,9 +974,6 @@ public:<br>

       // The linearized opcode of the operand at OpIdx, Lane.<br>

       bool OpIdxAPO = getData(OpIdx, Lane).APO;<br>

<br>

-      const unsigned BestScore = 2;<br>

-      const unsigned GoodScore = 1;<br>

-<br>

       // The best operand index and its score.<br>

       // Sometimes we have more than one option (e.g., Opcode and Undefs), so we<br>

       // are using the score to differentiate between the two.<br>

@@ -781,41 +1002,19 @@ public:<br>

         // Look for an operand that matches the current mode.<br>

         switch (RMode) {<br>

         case ReorderingMode::Load:<br>

-          if (isa<LoadInst>(Op)) {<br>

-            // Figure out which is left and right, so that we can check for<br>

-            // consecutive loads<br>

-            bool LeftToRight = Lane > LastLane;<br>

-            Value *OpLeft = (LeftToRight) ? OpLastLane : Op;<br>

-            Value *OpRight = (LeftToRight) ? Op : OpLastLane;<br>

-            if (isConsecutiveAccess(cast<LoadInst>(OpLeft),<br>

-                                    cast<LoadInst>(OpRight), DL, SE))<br>

-              BestOp.Idx = Idx;<br>

-          }<br>

-          break;<br>

-        case ReorderingMode::Opcode:<br>

-          // We accept both Instructions and Undefs, but with different scores.<br>

-          if ((isa<Instruction>(Op) && isa<Instruction>(OpLastLane) &&<br>

-               cast<Instruction>(Op)->getOpcode() ==<br>

-                   cast<Instruction>(OpLastLane)->getOpcode()) ||<br>

-              (isa<UndefValue>(OpLastLane) && isa<Instruction>(Op)) ||<br>

-              isa<UndefValue>(Op)) {<br>

-            // An instruction has a higher score than an undef.<br>

-            unsigned Score = (isa<UndefValue>(Op)) ? GoodScore : BestScore;<br>

-            if (Score > BestOp.Score) {<br>

-              BestOp.Idx = Idx;<br>

-              BestOp.Score = Score;<br>

-            }<br>

-          }<br>

-          break;<br>

         case ReorderingMode::Constant:<br>

-          if (isa<Constant>(Op)) {<br>

-            unsigned Score = (isa<UndefValue>(Op)) ? GoodScore : BestScore;<br>

-            if (Score > BestOp.Score) {<br>

-              BestOp.Idx = Idx;<br>

-              BestOp.Score = Score;<br>

-            }<br>

+        case ReorderingMode::Opcode: {<br>

+          bool LeftToRight = Lane > LastLane;<br>

+          Value *OpLeft = (LeftToRight) ? OpLastLane : Op;<br>

+          Value *OpRight = (LeftToRight) ? Op : OpLastLane;<br>

+          unsigned Score =<br>

+              getLookAheadScore({OpLeft, LastLane}, {OpRight, Lane});<br>

+          if (Score > BestOp.Score) {<br>

+            BestOp.Idx = Idx;<br>

+            BestOp.Score = Score;<br>

           }<br>

           break;<br>

+        }<br>

         case ReorderingMode::Splat:<br>

           if (Op == OpLastLane)<br>

             BestOp.Idx = Idx;<br>

@@ -946,8 +1145,8 @@ public:<br>

   public:<br>

     /// Initialize with all the operands of the instruction vector \p RootVL.<br>

     VLOperands(ArrayRef<Value *> RootVL, const DataLayout &DL,<br>

-               ScalarEvolution &SE)<br>

-        : DL(DL), SE(SE) {<br>

+               ScalarEvolution &SE, const BoUpSLP &R)<br>

+        : DL(DL), SE(SE), R(R) {<br>

       // Append all the operands of RootVL.<br>

       appendOperandsOfVL(RootVL);<br>

     }<br>

@@ -1169,7 +1368,8 @@ private:<br>

                                              SmallVectorImpl<Value *> &Left,<br>

                                              SmallVectorImpl<Value *> &Right,<br>

                                              const DataLayout &DL,<br>

-                                             ScalarEvolution &SE);<br>

+                                             ScalarEvolution &SE,<br>

+                                             const BoUpSLP &R);<br>

   struct TreeEntry {<br>

     using VecTreeTy = SmallVector<std::unique_ptr<TreeEntry>, 8>;<br>

     TreeEntry(VecTreeTy &Container) : Container(Container) {}<br>

@@ -2371,7 +2571,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br>

         // Commutative predicate - collect + sort operands of the instructions<br>

         // so that each side is more likely to have the same opcode.<br>

         assert(P0 == SwapP0 && "Commutative Predicate mismatch");<br>

-        reorderInputsAccordingToOpcode(VL, Left, Right, *DL, *SE);<br>

+        reorderInputsAccordingToOpcode(VL, Left, Right, *DL, *SE, *this);<br>

       } else {<br>

         // Collect operands - commute if it uses the swapped predicate.<br>

         for (Value *V : VL) {<br>

@@ -2416,7 +2616,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br>

       // have the same opcode.<br>

       if (isa<BinaryOperator>(VL0) && VL0->isCommutative()) {<br>

         ValueList Left, Right;<br>

-        reorderInputsAccordingToOpcode(VL, Left, Right, *DL, *SE);<br>

+        reorderInputsAccordingToOpcode(VL, Left, Right, *DL, *SE, *this);<br>

         buildTree_rec(Left, Depth + 1, {TE, 0});<br>

         buildTree_rec(Right, Depth + 1, {TE, 1});<br>

         return;<br>

@@ -2585,7 +2785,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br>

       // Reorder operands if reordering would enable vectorization.<br>

       if (isa<BinaryOperator>(VL0)) {<br>

         ValueList Left, Right;<br>

-        reorderInputsAccordingToOpcode(VL, Left, Right, *DL, *SE);<br>

+        reorderInputsAccordingToOpcode(VL, Left, Right, *DL, *SE, *this);<br>

         buildTree_rec(Left, Depth + 1, {TE, 0});<br>

         buildTree_rec(Right, Depth + 1, {TE, 1});<br>

         return;<br>

@@ -3302,13 +3502,15 @@ int BoUpSLP::getGatherCost(ArrayRef<Valu<br>

<br>

 // Perform operand reordering on the instructions in VL and return the reordered<br>

 // operands in Left and Right.<br>

-void BoUpSLP::reorderInputsAccordingToOpcode(<br>

-    ArrayRef<Value *> VL, SmallVectorImpl<Value *> &Left,<br>

-    SmallVectorImpl<Value *> &Right, const DataLayout &DL,<br>

-    ScalarEvolution &SE) {<br>

+void BoUpSLP::reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,<br>

+                                             SmallVectorImpl<Value *> &Left,<br>

+                                             SmallVectorImpl<Value *> &Right,<br>

+                                             const DataLayout &DL,<br>

+                                             ScalarEvolution &SE,<br>

+                                             const BoUpSLP &R) {<br>

   if (VL.empty())<br>

     return;<br>

-  VLOperands Ops(VL, DL, SE);<br>

+  VLOperands Ops(VL, DL, SE, R);<br>

   // Reorder the operands in place.<br>

   Ops.reorder();<br>

   Left = Ops.getVL(0);<br>

<br>

Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/lookahead.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/lookahead.ll?rev=364964&r1=364963&r2=364964&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/lookahead.ll?rev=364964&r1=364963&r2=364964&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/test/Transforms/SLPVectorizer/X86/lookahead.ll (original)<br>

+++ llvm/trunk/test/Transforms/SLPVectorizer/X86/lookahead.ll Tue Jul  2 13:20:28 2019<br>

@@ -27,22 +27,19 @@ define void @lookahead_basic(double* %ar<br>

 ; CHECK-NEXT:    [[IDX5:%.*]] = getelementptr inbounds double, double* [[ARRAY]], i64 5<br>

 ; CHECK-NEXT:    [[IDX6:%.*]] = getelementptr inbounds double, double* [[ARRAY]], i64 6<br>

 ; CHECK-NEXT:    [[IDX7:%.*]] = getelementptr inbounds double, double* [[ARRAY]], i64 7<br>

-; CHECK-NEXT:    [[A_0:%.*]] = load double, double* [[IDX0]], align 8<br>

-; CHECK-NEXT:    [[A_1:%.*]] = load double, double* [[IDX1]], align 8<br>

-; CHECK-NEXT:    [[B_0:%.*]] = load double, double* [[IDX2]], align 8<br>

-; CHECK-NEXT:    [[B_1:%.*]] = load double, double* [[IDX3]], align 8<br>

-; CHECK-NEXT:    [[C_0:%.*]] = load double, double* [[IDX4]], align 8<br>

-; CHECK-NEXT:    [[C_1:%.*]] = load double, double* [[IDX5]], align 8<br>

-; CHECK-NEXT:    [[D_0:%.*]] = load double, double* [[IDX6]], align 8<br>

-; CHECK-NEXT:    [[D_1:%.*]] = load double, double* [[IDX7]], align 8<br>

-; CHECK-NEXT:    [[SUBAB_0:%.*]] = fsub fast double [[A_0]], [[B_0]]<br>

-; CHECK-NEXT:    [[SUBCD_0:%.*]] = fsub fast double [[C_0]], [[D_0]]<br>

-; CHECK-NEXT:    [[SUBAB_1:%.*]] = fsub fast double [[A_1]], [[B_1]]<br>

-; CHECK-NEXT:    [[SUBCD_1:%.*]] = fsub fast double [[C_1]], [[D_1]]<br>

-; CHECK-NEXT:    [[ADDABCD_0:%.*]] = fadd fast double [[SUBAB_0]], [[SUBCD_0]]<br>

-; CHECK-NEXT:    [[ADDCDAB_1:%.*]] = fadd fast double [[SUBCD_1]], [[SUBAB_1]]<br>

-; CHECK-NEXT:    store double [[ADDABCD_0]], double* [[IDX0]], align 8<br>

-; CHECK-NEXT:    store double [[ADDCDAB_1]], double* [[IDX1]], align 8<br>

+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast double* [[IDX0]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x double>, <2 x double>* [[TMP0]], align 8<br>

+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast double* [[IDX2]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x double>, <2 x double>* [[TMP2]], align 8<br>

+; CHECK-NEXT:    [[TMP4:%.*]] = bitcast double* [[IDX4]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP5:%.*]] = load <2 x double>, <2 x double>* [[TMP4]], align 8<br>

+; CHECK-NEXT:    [[TMP6:%.*]] = bitcast double* [[IDX6]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP7:%.*]] = load <2 x double>, <2 x double>* [[TMP6]], align 8<br>

+; CHECK-NEXT:    [[TMP8:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]<br>

+; CHECK-NEXT:    [[TMP9:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]<br>

+; CHECK-NEXT:    [[TMP10:%.*]] = fadd fast <2 x double> [[TMP8]], [[TMP9]]<br>

+; CHECK-NEXT:    [[TMP11:%.*]] = bitcast double* [[IDX0]] to <2 x double>*<br>

+; CHECK-NEXT:    store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8<br>

 ; CHECK-NEXT:    ret void<br>

 ;<br>

 entry:<br>

@@ -164,22 +161,23 @@ define void @lookahead_alt2(double* %arr<br>

 ; CHECK-NEXT:    [[IDX5:%.*]] = getelementptr inbounds double, double* [[ARRAY]], i64 5<br>

 ; CHECK-NEXT:    [[IDX6:%.*]] = getelementptr inbounds double, double* [[ARRAY]], i64 6<br>

 ; CHECK-NEXT:    [[IDX7:%.*]] = getelementptr inbounds double, double* [[ARRAY]], i64 7<br>

-; CHECK-NEXT:    [[A_0:%.*]] = load double, double* [[IDX0]], align 8<br>

-; CHECK-NEXT:    [[A_1:%.*]] = load double, double* [[IDX1]], align 8<br>

-; CHECK-NEXT:    [[B_0:%.*]] = load double, double* [[IDX2]], align 8<br>

-; CHECK-NEXT:    [[B_1:%.*]] = load double, double* [[IDX3]], align 8<br>

-; CHECK-NEXT:    [[C_0:%.*]] = load double, double* [[IDX4]], align 8<br>

-; CHECK-NEXT:    [[C_1:%.*]] = load double, double* [[IDX5]], align 8<br>

-; CHECK-NEXT:    [[D_0:%.*]] = load double, double* [[IDX6]], align 8<br>

-; CHECK-NEXT:    [[D_1:%.*]] = load double, double* [[IDX7]], align 8<br>

-; CHECK-NEXT:    [[ADDAB_0:%.*]] = fadd fast double [[A_0]], [[B_0]]<br>

-; CHECK-NEXT:    [[SUBCD_0:%.*]] = fsub fast double [[C_0]], [[D_0]]<br>

-; CHECK-NEXT:    [[ADDCD_1:%.*]] = fadd fast double [[C_1]], [[D_1]]<br>

-; CHECK-NEXT:    [[SUBAB_1:%.*]] = fsub fast double [[A_1]], [[B_1]]<br>

-; CHECK-NEXT:    [[ADDABCD_0:%.*]] = fadd fast double [[ADDAB_0]], [[SUBCD_0]]<br>

-; CHECK-NEXT:    [[ADDCDAB_1:%.*]] = fadd fast double [[ADDCD_1]], [[SUBAB_1]]<br>

-; CHECK-NEXT:    store double [[ADDABCD_0]], double* [[IDX0]], align 8<br>

-; CHECK-NEXT:    store double [[ADDCDAB_1]], double* [[IDX1]], align 8<br>

+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast double* [[IDX0]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x double>, <2 x double>* [[TMP0]], align 8<br>

+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast double* [[IDX2]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x double>, <2 x double>* [[TMP2]], align 8<br>

+; CHECK-NEXT:    [[TMP4:%.*]] = bitcast double* [[IDX4]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP5:%.*]] = load <2 x double>, <2 x double>* [[TMP4]], align 8<br>

+; CHECK-NEXT:    [[TMP6:%.*]] = bitcast double* [[IDX6]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP7:%.*]] = load <2 x double>, <2 x double>* [[TMP6]], align 8<br>

+; CHECK-NEXT:    [[TMP8:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]<br>

+; CHECK-NEXT:    [[TMP9:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP7]]<br>

+; CHECK-NEXT:    [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3><br>

+; CHECK-NEXT:    [[TMP11:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]<br>

+; CHECK-NEXT:    [[TMP12:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]<br>

+; CHECK-NEXT:    [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3><br>

+; CHECK-NEXT:    [[TMP14:%.*]] = fadd fast <2 x double> [[TMP13]], [[TMP10]]<br>

+; CHECK-NEXT:    [[TMP15:%.*]] = bitcast double* [[IDX0]] to <2 x double>*<br>

+; CHECK-NEXT:    store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8<br>

 ; CHECK-NEXT:    ret void<br>

 ;<br>

 entry:<br>

@@ -239,6 +237,97 @@ define void @lookahead_external_uses(dou<br>

 ; CHECK-NEXT:    [[IDXB2:%.*]] = getelementptr inbounds double, double* [[B]], i64 2<br>

 ; CHECK-NEXT:    [[IDXA2:%.*]] = getelementptr inbounds double, double* [[A]], i64 2<br>

 ; CHECK-NEXT:    [[IDXB1:%.*]] = getelementptr inbounds double, double* [[B]], i64 1<br>

+; CHECK-NEXT:    [[A0:%.*]] = load double, double* [[IDXA0]], align 8<br>

+; CHECK-NEXT:    [[C0:%.*]] = load double, double* [[IDXC0]], align 8<br>

+; CHECK-NEXT:    [[D0:%.*]] = load double, double* [[IDXD0]], align 8<br>

+; CHECK-NEXT:    [[A1:%.*]] = load double, double* [[IDXA1]], align 8<br>

+; CHECK-NEXT:    [[B2:%.*]] = load double, double* [[IDXB2]], align 8<br>

+; CHECK-NEXT:    [[A2:%.*]] = load double, double* [[IDXA2]], align 8<br>

+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast double* [[IDXB0]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x double>, <2 x double>* [[TMP0]], align 8<br>

+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x double> undef, double [[C0]], i32 0<br>

+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[A1]], i32 1<br>

+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <2 x double> undef, double [[D0]], i32 0<br>

+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[B2]], i32 1<br>

+; CHECK-NEXT:    [[TMP6:%.*]] = fsub fast <2 x double> [[TMP3]], [[TMP5]]<br>

+; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <2 x double> undef, double [[A0]], i32 0<br>

+; CHECK-NEXT:    [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[A2]], i32 1<br>

+; CHECK-NEXT:    [[TMP9:%.*]] = fsub fast <2 x double> [[TMP8]], [[TMP1]]<br>

+; CHECK-NEXT:    [[TMP10:%.*]] = fadd fast <2 x double> [[TMP9]], [[TMP6]]<br>

+; CHECK-NEXT:    [[IDXS0:%.*]] = getelementptr inbounds double, double* [[S:%.*]], i64 0<br>

+; CHECK-NEXT:    [[IDXS1:%.*]] = getelementptr inbounds double, double* [[S]], i64 1<br>

+; CHECK-NEXT:    [[TMP11:%.*]] = bitcast double* [[IDXS0]] to <2 x double>*<br>

+; CHECK-NEXT:    store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8<br>

+; CHECK-NEXT:    store double [[A1]], double* [[EXT1:%.*]], align 8<br>

+; CHECK-NEXT:    ret void<br>

+;<br>

+entry:<br>

+  %IdxA0 = getelementptr inbounds double, double* %A, i64 0<br>

+  %IdxB0 = getelementptr inbounds double, double* %B, i64 0<br>

+  %IdxC0 = getelementptr inbounds double, double* %C, i64 0<br>

+  %IdxD0 = getelementptr inbounds double, double* %D, i64 0<br>

+<br>

+  %IdxA1 = getelementptr inbounds double, double* %A, i64 1<br>

+  %IdxB2 = getelementptr inbounds double, double* %B, i64 2<br>

+  %IdxA2 = getelementptr inbounds double, double* %A, i64 2<br>

+  %IdxB1 = getelementptr inbounds double, double* %B, i64 1<br>

+<br>

+  %A0 = load double, double *%IdxA0, align 8<br>

+  %B0 = load double, double *%IdxB0, align 8<br>

+  %C0 = load double, double *%IdxC0, align 8<br>

+  %D0 = load double, double *%IdxD0, align 8<br>

+<br>

+  %A1 = load double, double *%IdxA1, align 8<br>

+  %B2 = load double, double *%IdxB2, align 8<br>

+  %A2 = load double, double *%IdxA2, align 8<br>

+  %B1 = load double, double *%IdxB1, align 8<br>

+<br>

+  %subA0B0 = fsub fast double %A0, %B0<br>

+  %subC0D0 = fsub fast double %C0, %D0<br>

+<br>

+  %subA1B2 = fsub fast double %A1, %B2<br>

+  %subA2B1 = fsub fast double %A2, %B1<br>

+<br>

+  %add0 = fadd fast double %subA0B0, %subC0D0<br>

+  %add1 = fadd fast double %subA1B2, %subA2B1<br>

+<br>

+  %IdxS0 = getelementptr inbounds double, double* %S, i64 0<br>

+  %IdxS1 = getelementptr inbounds double, double* %S, i64 1<br>

+<br>

+  store double %add0, double *%IdxS0, align 8<br>

+  store double %add1, double *%IdxS1, align 8<br>

+<br>

+  ; External use<br>

+  store double %A1, double *%Ext1, align 8<br>

+  ret void<br>

+}<br>

+<br>

+; A[0] B[0] C[0] D[0]  A[1] B[2] A[2] B[1]<br>

+;     \  /   \  /       /  \  /   \  / \<br>

+;       -     -    U1,U2,U3  -     -  U4,U5<br>

+;        \   /                \   /<br>

+;          +                    +<br>

+;          |                    |<br>

+;         S[0]                 S[1]<br>

+;<br>

+;<br>

+; If we limit the users budget for the look-ahead heuristic to 2, then the<br>

+; look-ahead heuristic has no way of choosing B[1] (with 2 external users)<br>

+; over A[1] (with 3 external users).<br>

+; The result is that the operands are of the Add not reordered and the loads<br>

+; from A get vectorized instead of the loads from B.<br>

+;<br>

+define void @lookahead_limit_users_budget(double* %A, double *%B, double *%C, double *%D, double *%S, double *%Ext1, double *%Ext2, double *%Ext3, double *%Ext4, double *%Ext5) {<br>

+; CHECK-LABEL: @lookahead_limit_users_budget(<br>

+; CHECK-NEXT:  entry:<br>

+; CHECK-NEXT:    [[IDXA0:%.*]] = getelementptr inbounds double, double* [[A:%.*]], i64 0<br>

+; CHECK-NEXT:    [[IDXB0:%.*]] = getelementptr inbounds double, double* [[B:%.*]], i64 0<br>

+; CHECK-NEXT:    [[IDXC0:%.*]] = getelementptr inbounds double, double* [[C:%.*]], i64 0<br>

+; CHECK-NEXT:    [[IDXD0:%.*]] = getelementptr inbounds double, double* [[D:%.*]], i64 0<br>

+; CHECK-NEXT:    [[IDXA1:%.*]] = getelementptr inbounds double, double* [[A]], i64 1<br>

+; CHECK-NEXT:    [[IDXB2:%.*]] = getelementptr inbounds double, double* [[B]], i64 2<br>

+; CHECK-NEXT:    [[IDXA2:%.*]] = getelementptr inbounds double, double* [[A]], i64 2<br>

+; CHECK-NEXT:    [[IDXB1:%.*]] = getelementptr inbounds double, double* [[B]], i64 1<br>

 ; CHECK-NEXT:    [[B0:%.*]] = load double, double* [[IDXB0]], align 8<br>

 ; CHECK-NEXT:    [[C0:%.*]] = load double, double* [[IDXC0]], align 8<br>

 ; CHECK-NEXT:    [[D0:%.*]] = load double, double* [[IDXD0]], align 8<br>

@@ -262,6 +351,10 @@ define void @lookahead_external_uses(dou<br>

 ; CHECK-NEXT:    store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8<br>

 ; CHECK-NEXT:    [[TMP12:%.*]] = extractelement <2 x double> [[TMP1]], i32 1<br>

 ; CHECK-NEXT:    store double [[TMP12]], double* [[EXT1:%.*]], align 8<br>

+; CHECK-NEXT:    store double [[TMP12]], double* [[EXT2:%.*]], align 8<br>

+; CHECK-NEXT:    store double [[TMP12]], double* [[EXT3:%.*]], align 8<br>

+; CHECK-NEXT:    store double [[B1]], double* [[EXT4:%.*]], align 8<br>

+; CHECK-NEXT:    store double [[B1]], double* [[EXT5:%.*]], align 8<br>

 ; CHECK-NEXT:    ret void<br>

 ;<br>

 entry:<br>

@@ -300,7 +393,56 @@ entry:<br>

   store double %add0, double *%IdxS0, align 8<br>

   store double %add1, double *%IdxS1, align 8<br>

<br>

-  ; External use<br>

+  ; External uses of A1<br>

   store double %A1, double *%Ext1, align 8<br>

+  store double %A1, double *%Ext2, align 8<br>

+  store double %A1, double *%Ext3, align 8<br>

+<br>

+  ; External uses of B1<br>

+  store double %B1, double *%Ext4, align 8<br>

+  store double %B1, double *%Ext5, align 8<br>

+<br>

+  ret void<br>

+}<br>

+<br>

+; This checks that the lookahead code does not crash when instructions with the same opcodes have different numbers of operands (in this case the calls).<br>

+<br>

+%Class = type { i8 }<br>

+declare double @_ZN1i2ayEv(%Class*)<br>

+declare double @_ZN1i2axEv()<br>

+<br>

+define void @lookahead_crash(double* %A, double *%S, %Class *%Arg0) {<br>

+; CHECK-LABEL: @lookahead_crash(<br>

+; CHECK-NEXT:    [[IDXA0:%.*]] = getelementptr inbounds double, double* [[A:%.*]], i64 0<br>

+; CHECK-NEXT:    [[IDXA1:%.*]] = getelementptr inbounds double, double* [[A]], i64 1<br>

+; CHECK-NEXT:    [[TMP1:%.*]] = bitcast double* [[IDXA0]] to <2 x double>*<br>

+; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x double>, <2 x double>* [[TMP1]], align 8<br>

+; CHECK-NEXT:    [[C0:%.*]] = call double @_ZN1i2ayEv(%Class* [[ARG0:%.*]])<br>

+; CHECK-NEXT:    [[C1:%.*]] = call double @_ZN1i2axEv()<br>

+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <2 x double> undef, double [[C0]], i32 0<br>

+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[C1]], i32 1<br>

+; CHECK-NEXT:    [[TMP5:%.*]] = fadd fast <2 x double> [[TMP2]], [[TMP4]]<br>

+; CHECK-NEXT:    [[IDXS0:%.*]] = getelementptr inbounds double, double* [[S:%.*]], i64 0<br>

+; CHECK-NEXT:    [[IDXS1:%.*]] = getelementptr inbounds double, double* [[S]], i64 1<br>

+; CHECK-NEXT:    [[TMP6:%.*]] = bitcast double* [[IDXS0]] to <2 x double>*<br>

+; CHECK-NEXT:    store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8<br>

+; CHECK-NEXT:    ret void<br>

+;<br>

+  %IdxA0 = getelementptr inbounds double, double* %A, i64 0<br>

+  %IdxA1 = getelementptr inbounds double, double* %A, i64 1<br>

+<br>

+  %A0 = load double, double *%IdxA0, align 8<br>

+  %A1 = load double, double *%IdxA1, align 8<br>

+<br>

+  %C0 = call double @_ZN1i2ayEv(%Class *%Arg0)<br>

+  %C1 = call double @_ZN1i2axEv()<br>

+<br>

+  %add0 = fadd fast double %A0, %C0<br>

+  %add1 = fadd fast double %A1, %C1<br>

+<br>

+  %IdxS0 = getelementptr inbounds double, double* %S, i64 0<br>

+  %IdxS1 = getelementptr inbounds double, double* %S, i64 1<br>

+  store double %add0, double *%IdxS0, align 8<br>

+  store double %add1, double *%IdxS1, align 8<br>

   ret void<br>

 }<br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

</blockquote></div>