<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On May 12, 2016, at 10:11 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com" class="">chisophugis@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><br class="Apple-interchange-newline"><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">On Thu, May 12, 2016 at 6:42 PM, Michael Zolotukhin via llvm-commits<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;">Author: mzolotukhin<br class="">Date: Thu May 12 20:42:39 2016<br class="">New Revision: 269388<br class=""><br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project?rev=269388&view=rev" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-project?rev=269388&view=rev</a><br class="">Log:<br class="">[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...<br class=""><br class="">Summary:<br class="">...loop after the last iteration.<br class=""><br class="">This is really hard to do correctly. The core problem is that we need to<br class="">model liveness through the induction PHIs from iteration to iteration in<br class="">order to get the correct results, and we need to correctly de-duplicate<br class="">the common subgraphs of instructions feeding some subset of the<br class="">induction PHIs. All of this can be driven either from a side effect at<br class="">some iteration or from the loop values used after the loop finishes.<br class=""><br class="">This patch implements this by storing the forward-propagating analysis<br class="">of each instruction in a cache to recall whether it was free and whether<br class="">it has become live and thus counted toward the total unroll cost. Then,<br class="">at each sink for a value in the loop, we recursively walk back through<br class="">every value that feeds the sink, including looping back through the<br class="">iterations as needed, until we have marked the entire input graph as<br class="">live. Because we cache this, we never visit instructions more than twice<br class="">-- once when we analyze them and put them into the cache, and once when<br class="">we count their cost towards the unrolled loop. Also, because the cache<br class="">is only two bits and because we are dealing with relatively small<br class="">iteration counts, we can store all of this very densely in memory to<br class="">avoid this from becoming an excessively slow analysis.<br class=""><br class="">The code here is still pretty gross. I would appreciate suggestions<br class="">about better ways to factor or split this up, I've stared too long at<br class="">the algorithmic side to really have a good sense of what the design<br class="">should probably look at.<br class=""><br class="">Also, it might seem like we should do all of this bottom-up, but I think<br class="">that is a red herring. Specifically, the simplification power is *much*<br class="">greater working top-down. We can forward propagate very effectively,<br class="">even across strange and interesting recurrances around the backedge.<br class="">Because we use data to propagate, this doesn't cause a state space<br class="">explosion. Doing this level of constant folding, etc, would be very<br class="">expensive to do bottom-up because it wouldn't be until the last moment<br class="">that you could collapse everything. The current solution is essentially<br class="">a top-down simplification with a bottom-up cost accounting which seems<br class="">to get the best of both worlds. It makes the simplification incremental<br class="">and powerful while leaving everything dead until we *know* it is needed.<br class=""><br class="">Finally, a core property of this approach is its *monotonicity*. At all<br class="">times, the current UnrolledCost is a conservatively low estimate. This<br class="">ensures that we will never early-exit from the analysis due to exceeding<br class="">a threshold when if we had continued, the cost would have gone back<br class="">below the threshold. These kinds of bugs can cause incredibly hard to<br class="">track down random changes to behavior.<br class=""><br class="">We could use a techinque similar (but much simpler) within the inliner<br class="">as well to avoid considering speculated code in the inline cost.<br class=""><br class="">Reviewers: chandlerc<br class=""><br class="">Subscribers: sanjoy, mzolotukhin, llvm-commits<br class=""><br class="">Differential Revision:<span class="Apple-converted-space"> </span><a href="http://reviews.llvm.org/D11758" rel="noreferrer" target="_blank" class="">http://reviews.llvm.org/D11758</a><br class=""><br class="">Added:<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll<br class="">Modified:<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/include/llvm/Analysis/LoopUnrollAnalyzer.h<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/lib/Analysis/LoopUnrollAnalyzer.cpp<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-geps.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/unittests/Analysis/UnrollAnalyzer.cpp<br class=""><br class="">Modified: llvm/trunk/include/llvm/Analysis/LoopUnrollAnalyzer.h<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/LoopUnrollAnalyzer.h?rev=269388&r1=269387&r2=269388&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/LoopUnrollAnalyzer.h?rev=269388&r1=269387&r2=269388&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/include/llvm/Analysis/LoopUnrollAnalyzer.h (original)<br class="">+++ llvm/trunk/include/llvm/Analysis/LoopUnrollAnalyzer.h Thu May 12 20:42:39 2016<br class="">@@ -89,6 +89,7 @@ private:<br class="">   bool visitLoad(LoadInst &I);<br class="">   bool visitCastInst(CastInst &I);<br class="">   bool visitCmpInst(CmpInst &I);<br class="">+  bool visitPHINode(PHINode &PN);<br class=""> };<br class=""> }<br class=""> #endif<br class=""><br class="">Modified: llvm/trunk/lib/Analysis/LoopUnrollAnalyzer.cpp<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LoopUnrollAnalyzer.cpp?rev=269388&r1=269387&r2=269388&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LoopUnrollAnalyzer.cpp?rev=269388&r1=269387&r2=269388&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/lib/Analysis/LoopUnrollAnalyzer.cpp (original)<br class="">+++ llvm/trunk/lib/Analysis/LoopUnrollAnalyzer.cpp Thu May 12 20:42:39 2016<br class="">@@ -189,3 +189,13 @@ bool UnrolledInstAnalyzer::visitCmpInst(<br class=""><br class="">   return Base::visitCmpInst(I);<br class=""> }<br class="">+<br class="">+bool UnrolledInstAnalyzer::visitPHINode(PHINode &PN) {<br class="">+  // Run base visitor first. This way we can gather some useful for later<br class="">+  // analysis information.<br class="">+  if (Base::visitPHINode(PN))<br class="">+    return true;<br class="">+<br class="">+  // The loop induction PHI nodes are definitionally free.<br class="">+  return PN.getParent() == L->getHeader();<br class="">+}<br class=""><br class="">Modified: llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp?rev=269388&r1=269387&r2=269388&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp?rev=269388&r1=269387&r2=269388&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp (original)<br class="">+++ llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp Thu May 12 20:42:39 2016<br class="">@@ -185,6 +185,40 @@ static TargetTransformInfo::UnrollingPre<br class=""> }<br class=""><br class=""> namespace {<br class="">+/// A struct to densely store the state of an instruction after unrolling at<br class="">+/// each iteration.<br class="">+///<br class="">+/// This is designed to work like a tuple of <Instruction *, int> for the<br class="">+/// purposes of hashing and lookup, but to be able to associate two boolean<br class="">+/// states with each key.<br class="">+struct UnrolledInstState {<br class="">+  Instruction *I;<br class="">+  int Iteration : 30;<br class="">+  unsigned IsFree : 1;<br class="">+  unsigned IsCounted : 1;<br class="">+};<br class="">+<br class="">+/// Hashing and equality testing for a set of the instruction states.<br class="">+struct UnrolledInstStateKeyInfo {<br class="">+  typedef DenseMapInfo<Instruction *> PtrInfo;<br class="">+  typedef DenseMapInfo<std::pair<Instruction *, int>> PairInfo;<br class="">+  static inline UnrolledInstState getEmptyKey() {<br class="">+    return {PtrInfo::getEmptyKey(), 0, 0, 0};<br class="">+  }<br class="">+  static inline UnrolledInstState getTombstoneKey() {<br class="">+    return {PtrInfo::getTombstoneKey(), 0, 0, 0};<br class="">+  }<br class="">+  static inline unsigned getHashValue(const UnrolledInstState &S) {<br class="">+    return PairInfo::getHashValue({S.I, S.Iteration});<br class="">+  }<br class="">+  static inline bool isEqual(const UnrolledInstState &LHS,<br class="">+                             const UnrolledInstState &RHS) {<br class="">+    return PairInfo::isEqual({LHS.I, LHS.Iteration}, {RHS.I, RHS.Iteration});<br class="">+  }<br class="">+};<br class="">+}<br class="">+<br class="">+namespace {<br class=""> struct EstimatedUnrollCost {<br class="">   /// \brief The estimated cost after unrolling.<br class="">   int UnrolledCost;<br class="">@@ -218,18 +252,25 @@ analyzeLoopUnrollCost(const Loop *L, uns<br class="">   assert(UnrollMaxIterationsCountToAnalyze < (INT_MAX / 2) &&<br class="">         <span class="Apple-converted-space"> </span>"The unroll iterations max is too large!");<br class=""><br class="">+  // Only analyze inner loops. We can't properly estimate cost of nested loops<br class="">+  // and we won't visit inner loops again anyway.<br class="">+  if (!L->empty())<br class="">+    return None;<br class="">+<br class="">   // Don't simulate loops with a big or unknown tripcount<br class="">   if (!UnrollMaxIterationsCountToAnalyze || !TripCount ||<br class="">       TripCount > UnrollMaxIterationsCountToAnalyze)<br class="">     return None;<br class=""><br class="">   SmallSetVector<BasicBlock *, 16> BBWorklist;<br class="">+  SmallSetVector<std::pair<BasicBlock *, BasicBlock *>, 4> ExitWorklist;<br class="">   DenseMap<Value *, Constant *> SimplifiedValues;<br class="">   SmallVector<std::pair<Value *, Constant *>, 4> SimplifiedInputValues;<br class=""><br class="">   // The estimated cost of the unrolled form of the loop. We try to estimate<br class="">   // this by simplifying as much as we can while computing the estimate.<br class="">   int UnrolledCost = 0;<br class="">+<br class="">   // We also track the estimated dynamic (that is, actually executed) cost in<br class="">   // the rolled form. This helps identify cases when the savings from unrolling<br class="">   // aren't just exposing dead control flows, but actual reduced dynamic<br class="">@@ -237,6 +278,97 @@ analyzeLoopUnrollCost(const Loop *L, uns<br class="">   // unrolling.<br class="">   int RolledDynamicCost = 0;<br class=""><br class="">+  // We track the simplification of each instruction in each iteration. We use<br class="">+  // this to recursively merge costs into the unrolled cost on-demand so that<br class="">+  // we don't count the cost of any dead code. This is essentially a map from<br class="">+  // <instruction, int> to <bool, bool>, but stored as a densely packed struct.<br class="">+  DenseSet<UnrolledInstState, UnrolledInstStateKeyInfo> InstCostMap;<br class="">+<br class="">+  // A small worklist used to accumulate cost of instructions from each<br class="">+  // observable and reached root in the loop.<br class="">+  SmallVector<Instruction *, 16> CostWorklist;<br class="">+<br class="">+  // PHI-used worklist used between iterations while accumulating cost.<br class="">+  SmallVector<Instruction *, 4> PHIUsedList;<br class="">+<br class="">+  // Helper function to accumulate cost for instructions in the loop.<br class="">+  auto AddCostRecursively = [&](Instruction &RootI, int Iteration) {<br class="">+    assert(Iteration >= 0 && "Cannot have a negative iteration!");<br class="">+    assert(CostWorklist.empty() && "Must start with an empty cost list");<br class="">+    assert(PHIUsedList.empty() && "Must start with an empty phi used list");<br class="">+    CostWorklist.push_back(&RootI);<br class="">+    for (;; --Iteration) {<br class="">+      do {<br class="">+        Instruction *I = CostWorklist.pop_back_val();<br class="">+<br class="">+        // InstCostMap only uses I and Iteration as a key, the other two values<br class="">+        // don't matter here.<br class="">+        auto CostIter = InstCostMap.find({I, Iteration, 0, 0});<br class="">+        if (CostIter == InstCostMap.end())<br class="">+          // If an input to a PHI node comes from a dead path through the loop<br class="">+          // we may have no cost data for it here. What that actually means is<br class="">+          // that it is free.<br class="">+          continue;<br class="">+        auto &Cost = *CostIter;<br class="">+        if (Cost.IsCounted)<br class="">+          // Already counted this instruction.<br class="">+          continue;<br class="">+<br class="">+        // Mark that we are counting the cost of this instruction now.<br class="">+        Cost.IsCounted = true;<br class="">+<br class="">+        // If this is a PHI node in the loop header, just add it to the PHI set.<br class="">+        if (auto *PhiI = dyn_cast<PHINode>(I))<br class="">+          if (PhiI->getParent() == L->getHeader()) {<br class="">+            assert(Cost.IsFree && "Loop PHIs shouldn't be evaluated as they "<br class="">+                                  "inherently simplify during unrolling.");<br class=""></blockquote><div class=""><br class=""></div><div class="">This assertion seems to be failing:</div><div class=""><a href="http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/5255/steps/test/logs/stdio" class="">http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/5255/steps/test/logs/stdio</a><br class=""></div></div></div></blockquote>I reverted it for now (r269395). I’ll try to look into the issue tomorrow.</div><div><br class=""></div><div>Thanks,</div><div>Michael<br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class=""><br class=""></div><div class="">-- Sean Silva</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;">+            if (Iteration == 0)<br class="">+              continue;<br class="">+<br class="">+            // Push the incoming value from the backedge into the PHI used list<br class="">+            // if it is an in-loop instruction. We'll use this to populate the<br class="">+            // cost worklist for the next iteration (as we count backwards).<br class="">+            if (auto *OpI = dyn_cast<Instruction>(<br class="">+                    PhiI->getIncomingValueForBlock(L->getLoopLatch())))<br class="">+              if (L->contains(OpI))<br class="">+                PHIUsedList.push_back(OpI);<br class="">+            continue;<br class="">+          }<br class="">+<br class="">+        // First accumulate the cost of this instruction.<br class="">+        if (!Cost.IsFree) {<br class="">+          UnrolledCost += TTI.getUserCost(I);<br class="">+          DEBUG(dbgs() << "Adding cost of instruction (iteration " << Iteration<br class="">+                       << "): ");<br class="">+          DEBUG(I->dump());<br class="">+        }<br class="">+<br class="">+        // We must count the cost of every operand which is not free,<br class="">+        // recursively. If we reach a loop PHI node, simply add it to the set<br class="">+        // to be considered on the next iteration (backwards!).<br class="">+        for (Value *Op : I->operands()) {<br class="">+          // Check whether this operand is free due to being a constant or<br class="">+          // outside the loop.<br class="">+          auto *OpI = dyn_cast<Instruction>(Op);<br class="">+          if (!OpI || !L->contains(OpI))<br class="">+            continue;<br class="">+<br class="">+          // Otherwise accumulate its cost.<br class="">+          CostWorklist.push_back(OpI);<br class="">+        }<br class="">+      } while (!CostWorklist.empty());<br class="">+<br class="">+      if (PHIUsedList.empty())<br class="">+        // We've exhausted the search.<br class="">+        break;<br class="">+<br class="">+      assert(Iteration > 0 &&<br class="">+             "Cannot track PHI-used values past the first iteration!");<br class="">+      CostWorklist.append(PHIUsedList.begin(), PHIUsedList.end());<br class="">+      PHIUsedList.clear();<br class="">+    }<br class="">+  };<br class="">+<br class="">   // Ensure that we don't violate the loop structure invariants relied on by<br class="">   // this analysis.<br class="">   assert(L->isLoopSimplifyForm() && "Must put loop into normal form first.");<br class="">@@ -291,22 +423,32 @@ analyzeLoopUnrollCost(const Loop *L, uns<br class="">       // it.  We don't change the actual IR, just count optimization<br class="">       // opportunities.<br class="">       for (Instruction &I : *BB) {<br class="">-        int InstCost = TTI.getUserCost(&I);<br class="">+        // Track this instruction's expected baseline cost when executing the<br class="">+        // rolled loop form.<br class="">+        RolledDynamicCost += TTI.getUserCost(&I);<br class=""><br class="">         // Visit the instruction to analyze its loop cost after unrolling,<br class="">-        // and if the visitor returns false, include this instruction in the<br class="">-        // unrolled cost.<br class="">-        if (!Analyzer.visit(I))<br class="">-          UnrolledCost += InstCost;<br class="">-        else {<br class="">-          DEBUG(dbgs() << "  " << I<br class="">-                       << " would be simplified if loop is unrolled.\n");<br class="">-          (void)0;<br class="">-        }<br class="">+        // and if the visitor returns true, mark the instruction as free after<br class="">+        // unrolling and continue.<br class="">+        bool IsFree = Analyzer.visit(I);<br class="">+        bool Inserted = InstCostMap.insert({&I, (int)Iteration, IsFree,<br class="">+                                            /*IsCounted*/ false})<br class="">+                            .second;<br class="">+        (void)Inserted;<br class="">+        assert(Inserted && "Cannot have a state for an unvisited instruction!");<br class=""><br class="">-        // Also track this instructions expected cost when executing the rolled<br class="">-        // loop form.<br class="">-        RolledDynamicCost += InstCost;<br class="">+        if (IsFree)<br class="">+          continue;<br class="">+<br class="">+        // If the instruction might have a side-effect recursively account for<br class="">+        // the cost of it and all the instructions leading up to it.<br class="">+        if (I.mayHaveSideEffects())<br class="">+          AddCostRecursively(I, Iteration);<br class="">+<br class="">+        // Can't properly model a cost of a call.<br class="">+        // FIXME: With a proper cost model we should be able to do it.<br class="">+        if(isa<CallInst>(&I))<br class="">+          return None;<br class=""><br class="">         // If unrolled body turns out to be too big, bail out.<br class="">         if (UnrolledCost > MaxUnrolledLoopSize) {<br class="">@@ -335,6 +477,8 @@ analyzeLoopUnrollCost(const Loop *L, uns<br class="">                   cast<ConstantInt>(SimpleCond)->isZero() ? 1 : 0);<br class="">             if (L->contains(Succ))<br class="">               BBWorklist.insert(Succ);<br class="">+            else<br class="">+              ExitWorklist.insert({BB, Succ});<br class="">             continue;<br class="">           }<br class="">         }<br class="">@@ -350,6 +494,8 @@ analyzeLoopUnrollCost(const Loop *L, uns<br class="">                       <span class="Apple-converted-space"> </span>.getCaseSuccessor();<br class="">           if (L->contains(Succ))<br class="">             BBWorklist.insert(Succ);<br class="">+          else<br class="">+            ExitWorklist.insert({BB, Succ});<br class="">           continue;<br class="">         }<br class="">       }<br class="">@@ -358,6 +504,8 @@ analyzeLoopUnrollCost(const Loop *L, uns<br class="">       for (BasicBlock *Succ : successors(BB))<br class="">         if (L->contains(Succ))<br class="">           BBWorklist.insert(Succ);<br class="">+        else<br class="">+          ExitWorklist.insert({BB, Succ});<br class="">     }<br class=""><br class="">     // If we found no optimization opportunities on the first iteration, we<br class="">@@ -368,6 +516,23 @@ analyzeLoopUnrollCost(const Loop *L, uns<br class="">       return None;<br class="">     }<br class="">   }<br class="">+<br class="">+  while (!ExitWorklist.empty()) {<br class="">+    BasicBlock *ExitingBB, *ExitBB;<br class="">+    std::tie(ExitingBB, ExitBB) = ExitWorklist.pop_back_val();<br class="">+<br class="">+    for (Instruction &I : *ExitBB) {<br class="">+      auto *PN = dyn_cast<PHINode>(&I);<br class="">+      if (!PN)<br class="">+        break;<br class="">+<br class="">+      Value *Op = PN->getIncomingValueForBlock(ExitingBB);<br class="">+      if (auto *OpI = dyn_cast<Instruction>(Op))<br class="">+        if (L->contains(OpI))<br class="">+          AddCostRecursively(*OpI, TripCount - 1);<br class="">+    }<br class="">+  }<br class="">+<br class="">   DEBUG(dbgs() << "Analysis finished:\n"<br class="">               <span class="Apple-converted-space"> </span><< "UnrolledCost: " << UnrolledCost << ", "<br class="">               <span class="Apple-converted-space"> </span><< "RolledDynamicCost: " << RolledDynamicCost << "\n");<br class=""><br class="">Modified: llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll?rev=269388&r1=269387&r2=269388&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll?rev=269388&r1=269387&r2=269388&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll (original)<br class="">+++ llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll Thu May 12 20:42:39 2016<br class="">@@ -1,4 +1,4 @@<br class="">-; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10  -unroll-percent-dynamic-cost-saved-threshold=50 -unroll-dynamic-cost-savings-discount=90 | FileCheck %s<br class="">+; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10  -unroll-percent-dynamic-cost-saved-threshold=70 -unroll-dynamic-cost-savings-discount=90 | FileCheck %s<br class=""> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"<br class=""><br class=""> @unknown_global = internal unnamed_addr global [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16<br class=""><br class="">Added: llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll?rev=269388&view=auto" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll?rev=269388&view=auto</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll (added)<br class="">+++ llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll Thu May 12 20:42:39 2016<br class="">@@ -0,0 +1,38 @@<br class="">+; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-dynamic-cost-savings-discount=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=60 | FileCheck %s<br class="">+target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"<br class="">+<br class="">+@known_constant = internal unnamed_addr constant [10 x i32] [i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0], align 16<br class="">+<br class="">+; If a load becomes a constant after loop unrolling, we sometimes can simplify<br class="">+; CFG. This test verifies that we handle such cases.<br class="">+; After one operand in an instruction is constant-folded and the<br class="">+; instruction is simplified, the other operand might become dead.<br class="">+; In this test we have::<br class="">+; for i in 1..10:<br class="">+;   r += A[i] * B[i]<br class="">+; A[i] is 0 almost at every iteration, so there is no need in loading B[i] at<br class="">+; all.<br class="">+<br class="">+<br class="">+; CHECK-LABEL: @unroll_dce<br class="">+; CHECK-NOT:   br i1 %exitcond, label %for.end, label %for.body<br class="">+define i32 @unroll_dce(i32* noalias nocapture readonly %b) {<br class="">+entry:<br class="">+  br label %for.body<br class="">+<br class="">+for.body:                                         ; preds = %for.body, %entry<br class="">+  %iv.0 = phi i64 [ 0, %entry ], [ %iv.1, %for.body ]<br class="">+  %r.0 = phi i32 [ 0, %entry ], [ %r.1, %for.body ]<br class="">+  %arrayidx1 = getelementptr inbounds [10 x i32], [10 x i32]* @known_constant, i64 0, i64 %iv.0<br class="">+  %x1 = load i32, i32* %arrayidx1, align 4<br class="">+  %arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv.0<br class="">+  %x2 = load i32, i32* %arrayidx2, align 4<br class="">+  %mul = mul i32 %x1, %x2<br class="">+  %r.1 = add i32 %mul, %r.0<br class="">+  %iv.1 = add nuw nsw i64 %iv.0, 1<br class="">+  %exitcond = icmp eq i64 %iv.1, 10<br class="">+  br i1 %exitcond, label %for.end, label %for.body<br class="">+<br class="">+for.end:                                          ; preds = %for.body<br class="">+  ret i32 %r.1<br class="">+}<br class=""><br class="">Modified: llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-geps.ll<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-geps.ll?rev=269388&r1=269387&r2=269388&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-geps.ll?rev=269388&r1=269387&r2=269388&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-geps.ll (original)<br class="">+++ llvm/trunk/test/Transforms/LoopUnroll/full-unroll-heuristics-geps.ll Thu May 12 20:42:39 2016<br class="">@@ -1,4 +1,4 @@<br class="">-; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-dynamic-cost-savings-discount=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=40 | FileCheck %s<br class="">+; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-dynamic-cost-savings-discount=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=60 | FileCheck %s<br class=""> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"<br class=""><br class=""> ; When examining gep-instructions we shouldn't consider them simplified if the<br class=""><br class="">Modified: llvm/trunk/unittests/Analysis/UnrollAnalyzer.cpp<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/unittests/Analysis/UnrollAnalyzer.cpp?rev=269388&r1=269387&r2=269388&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/unittests/Analysis/UnrollAnalyzer.cpp?rev=269388&r1=269387&r2=269388&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/unittests/Analysis/UnrollAnalyzer.cpp (original)<br class="">+++ llvm/trunk/unittests/Analysis/UnrollAnalyzer.cpp Thu May 12 20:42:39 2016<br class="">@@ -134,6 +134,7 @@ TEST(UnrollAnalyzerTest, OuterLoopSimpli<br class="">       "  br label %outer.loop\n"<br class="">       "outer.loop:\n"<br class="">       "  %iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %outer.loop.latch ]\n"<br class="">+      "  %iv.outer.next = add nuw nsw i64 %iv.outer, 1\n"<br class="">       "  br label %inner.loop\n"<br class="">       "inner.loop:\n"<br class="">       "  %iv.inner = phi i64 [ 0, %outer.loop ], [ %iv.inner.next, %inner.loop ]\n"<br class="">@@ -141,7 +142,6 @@ TEST(UnrollAnalyzerTest, OuterLoopSimpli<br class="">       "  %exitcond.inner = icmp eq i64 %iv.inner.next, 1000\n"<br class="">       "  br i1 %exitcond.inner, label %outer.loop.latch, label %inner.loop\n"<br class="">       "outer.loop.latch:\n"<br class="">-      "  %iv.outer.next = add nuw nsw i64 %iv.outer, 1\n"<br class="">       "  %exitcond.outer = icmp eq i64 %iv.outer.next, 40\n"<br class="">       "  br i1 %exitcond.outer, label %exit, label %outer.loop\n"<br class="">       "exit:\n"<br class="">@@ -163,11 +163,15 @@ TEST(UnrollAnalyzerTest, OuterLoopSimpli<br class="">   BasicBlock *InnerBody = &*FI++;<br class=""><br class="">   BasicBlock::iterator BBI = Header->begin();<br class="">-  Instruction *Y1 = &*BBI++;<br class="">+  BBI++;<br class="">+  Instruction *Y1 = &*BBI;<br class="">   BBI = InnerBody->begin();<br class="">-  Instruction *Y2 = &*BBI++;<br class="">+  BBI++;<br class="">+  Instruction *Y2 = &*BBI;<br class="">   // Check that we can simplify IV of the outer loop, but can't simplify the IV<br class="">   // of the inner loop if we only know the iteration number of the outer loop.<br class="">+  //<br class="">+  //  Y1 is %iv.outer.next, Y2 is %iv.inner.next<br class="">   auto I1 = SimplifiedValuesVector[0].find(Y1);<br class="">   EXPECT_TRUE(I1 != SimplifiedValuesVector[0].end());<br class="">   auto I2 = SimplifiedValuesVector[0].find(Y2);<br class=""><br class=""><br class="">_______________________________________________<br class="">llvm-commits mailing list<br class=""><a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a><br class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a></blockquote></div></div></blockquote></div><br class=""></body></html>