[llvm] r294463 - LVI: Add a per-value worklist limit to LazyValueInfo.

Wed Feb 8 21:29:33 PST 2017

Danny,

I feel like there's something missing here.  In particular, I'm not sure 
I agree with your description of the problem.  See inline comments 
below.  Can you help me understand why this is needed?

On 02/08/2017 07:22 AM, Daniel Berlin via llvm-commits wrote:
> Author: dannyb
> Date: Wed Feb  8 09:22:52 2017
> New Revision: 294463
>
> URL: http://llvm.org/viewvc/llvm-project?rev=294463&view=rev
> Log:
> LVI: Add a per-value worklist limit to LazyValueInfo.
>
> Summary:
> LVI is now depth first, which is optimal for iteration strategy in
> terms of work per call.  However, the way the results get cached means
> it can still go very badly N^2 or worse right now.  The overdefined
> cache is per-block, because LVI wants to try to get different results
> for the same name in different blocks (IE solve the problem
> PredicateInfo solves).  This means even if we discover a value is
> overdefined after going very deep, it doesn't cache this information,
> causing it to end up trying to rediscover it again and again.  The
> same is true for values along the way.
This doesn't parse for me.  We have the OverdefinedCache and do cache 
the result of a (Value, BB) pair being overdefined.  Other than the fact 
the cache is split - which admitted is confusing - where's the problem here?
> In practice, overdefined
> anywhere should mean overdefined everywhere (this is how, for example,
> SCCP works).
>
> Until we get around to reworking the overdefined cache, we need to
> limit the worklist size we process.  Note that permanently reverting
> the DFS strategy exploration seems the wrong strategy (temporarily
> seems fine if we really want).  BFS is clearly the wrong approach, it
> just gets luckier on some testcases.  It's also very hard to design
> an effective throttle for BFS. For DFS, the throttle is directly related
> to the depth of the CFG.  So really deep CFGs will get cutoff, smaller
> ones will not. As the CFG simplifies, you get better results.
> In BFS, the limit is it's related to the fan-out times average block size,
> which is harder to reason about or make good choices for.
I agree that a threshold for DFS makes much more sense than for BFS.  I 
think having the threshold is a good idea, though I'd still like to 
understand the test case.  I suspect we should be able to use a much 
higher threshold value.
>
> Bug being filed about the overdefined cache, but it will require major
> surgery to fix it (plumbing predicateinfo through CVP or LVI).
>
> Note: I did not make this number configurable because i'm not sure
> anyone really needs to tweak this knob.  We run CVP 3 times. On the
> testcases i have the slow ones happen in the middle, where CVP is
> doing cleanup work other things are effective at.  Over the course of
> 3 runs, we don't see to have any real loss of performance.
I have a very different pass pipeline than you do.  Please don't forget 
that when tuning.  Over-fitting the current O2 pipeline is a path we 
don't want to pursue.

Did you explore other values for the cutoff?  In particular, if we used 
10,000 would your test case scale badly?
>
> I haven't gotten a minimized testcase yet, but just imagine in your
> head a testcase where, going *up* the CFG, you have branches, one of
> which leads 50000 blocks deep, and the other, to something where the
> answer is overdefined immediately.  BFS would discover the overdefined
> faster than DFS, but do more work to do so.  In practice, the right
> answer is "once DFS discovers overdefined for a value, stop trying to
> get more info about that value" (and so, DFS would normally cache the
> overdefined results for every value it passed through in those 50k
> blocks, and never do that work again. But it don't, because of the
> naming problem)
>
> Reviewers: chandlerc, djasper
>
> Subscribers: llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D29715
>
> Modified:
>      llvm/trunk/lib/Analysis/LazyValueInfo.cpp
>
> Modified: llvm/trunk/lib/Analysis/LazyValueInfo.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LazyValueInfo.cpp?rev=294463&r1=294462&r2=294463&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Analysis/LazyValueInfo.cpp (original)
> +++ llvm/trunk/lib/Analysis/LazyValueInfo.cpp Wed Feb  8 09:22:52 2017
> @@ -39,6 +39,10 @@ using namespace PatternMatch;
>   
>   #define DEBUG_TYPE "lazy-value-info"
>   
> +// This is the number of worklist items we will process to try to discover an
> +// answer for a given value.
> +static const unsigned MaxProcessedPerValue = 500;
> +
>   char LazyValueInfoWrapperPass::ID = 0;
>   INITIALIZE_PASS_BEGIN(LazyValueInfoWrapperPass, "lazy-value-info",
>                   "Lazy Value Information Analysis", false, true)
> @@ -563,7 +567,7 @@ namespace {
>       /// This stack holds the state of the value solver during a query.
>       /// It basically emulates the callstack of the naive
>       /// recursive value lookup process.
> -    std::stack<std::pair<BasicBlock*, Value*> > BlockValueStack;
> +    SmallVector<std::pair<BasicBlock*, Value*>, 8> BlockValueStack;
>   
>       /// Keeps track of which block-value pairs are in BlockValueStack.
>       DenseSet<std::pair<BasicBlock*, Value*> > BlockValueSet;
> @@ -576,7 +580,7 @@ namespace {
>   
>         DEBUG(dbgs() << "PUSH: " << *BV.second << " in " << BV.first->getName()
>                      << "\n");
> -      BlockValueStack.push(BV);
> +      BlockValueStack.push_back(BV);
>         return true;
>       }
>   
> @@ -646,24 +650,50 @@ namespace {
>   } // end anonymous namespace
>   
>   void LazyValueInfoImpl::solve() {
> +  SmallVector<std::pair<BasicBlock *, Value *>, 8> StartingStack(
> +      BlockValueStack.begin(), BlockValueStack.end());
> +
> +  unsigned processedCount = 0;
>     while (!BlockValueStack.empty()) {
> -    std::pair<BasicBlock*, Value*> &e = BlockValueStack.top();
> +    processedCount++;
> +    // Abort if we have to process too many values to get a result for this one.
> +    // Because of the design of the overdefined cache currently being per-block
> +    // to avoid naming-related issues (IE it wants to try to give different
> +    // results for the same name in different blocks), overdefined results don't
> +    // get cached globally, which in turn means we will often try to rediscover
> +    // the same overdefined result again and again.  Once something like
> +    // PredicateInfo is used in LVI or CVP, we should be able to make the
> +    // overdefined cache global, and remove this throttle.
> +    if (processedCount > MaxProcessedPerValue) {
> +      DEBUG(dbgs() << "Giving up on stack because we are getting too deep\n");
> +      // Fill in the original values
> +      while (!StartingStack.empty()) {
> +        std::pair<BasicBlock *, Value *> &e = StartingStack.back();
> +        TheCache.insertResult(e.second, e.first,
> +                              LVILatticeVal::getOverdefined());
> +        StartingStack.pop_back();
> +      }
> +      BlockValueSet.clear();
> +      BlockValueStack.clear();
> +      return;
> +    }
> +    std::pair<BasicBlock *, Value *> &e = BlockValueStack.back();
>       assert(BlockValueSet.count(e) && "Stack value should be in BlockValueSet!");
>   
>       if (solveBlockValue(e.second, e.first)) {
>         // The work item was completely processed.
> -      assert(BlockValueStack.top() == e && "Nothing should have been pushed!");
> +      assert(BlockValueStack.back() == e && "Nothing should have been pushed!");
>         assert(TheCache.hasCachedValueInfo(e.second, e.first) &&
>                "Result should be in cache!");
>   
>         DEBUG(dbgs() << "POP " << *e.second << " in " << e.first->getName()
>                      << " = " << TheCache.getCachedValueInfo(e.second, e.first) << "\n");
>   
> -      BlockValueStack.pop();
> +      BlockValueStack.pop_back();
>         BlockValueSet.erase(e);
>       } else {
>         // More work needs to be done before revisiting.
> -      assert(BlockValueStack.top() != e && "Stack should have been pushed!");
> +      assert(BlockValueStack.back() != e && "Stack should have been pushed!");
>       }
>     }
>   }
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits