[llvm] r294463 - LVI: Add a per-value worklist limit to LazyValueInfo.

Fri Feb 10 09:53:47 PST 2017

ping

On 02/08/2017 09:29 PM, Philip Reames via llvm-commits wrote:
> Danny,
>
> I feel like there's something missing here.  In particular, I'm not 
> sure I agree with your description of the problem.  See inline 
> comments below.  Can you help me understand why this is needed?
>
> On 02/08/2017 07:22 AM, Daniel Berlin via llvm-commits wrote:
>> Author: dannyb
>> Date: Wed Feb  8 09:22:52 2017
>> New Revision: 294463
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=294463&view=rev
>> Log:
>> LVI: Add a per-value worklist limit to LazyValueInfo.
>>
>> Summary:
>> LVI is now depth first, which is optimal for iteration strategy in
>> terms of work per call.  However, the way the results get cached means
>> it can still go very badly N^2 or worse right now.  The overdefined
>> cache is per-block, because LVI wants to try to get different results
>> for the same name in different blocks (IE solve the problem
>> PredicateInfo solves).  This means even if we discover a value is
>> overdefined after going very deep, it doesn't cache this information,
>> causing it to end up trying to rediscover it again and again. The
>> same is true for values along the way.
> This doesn't parse for me.  We have the OverdefinedCache and do cache 
> the result of a (Value, BB) pair being overdefined.  Other than the 
> fact the cache is split - which admitted is confusing - where's the 
> problem here?
>> In practice, overdefined
>> anywhere should mean overdefined everywhere (this is how, for example,
>> SCCP works).
>>
>> Until we get around to reworking the overdefined cache, we need to
>> limit the worklist size we process.  Note that permanently reverting
>> the DFS strategy exploration seems the wrong strategy (temporarily
>> seems fine if we really want).  BFS is clearly the wrong approach, it
>> just gets luckier on some testcases.  It's also very hard to design
>> an effective throttle for BFS. For DFS, the throttle is directly related
>> to the depth of the CFG.  So really deep CFGs will get cutoff, smaller
>> ones will not. As the CFG simplifies, you get better results.
>> In BFS, the limit is it's related to the fan-out times average block 
>> size,
>> which is harder to reason about or make good choices for.
> I agree that a threshold for DFS makes much more sense than for BFS.  
> I think having the threshold is a good idea, though I'd still like to 
> understand the test case.  I suspect we should be able to use a much 
> higher threshold value.
>>
>> Bug being filed about the overdefined cache, but it will require major
>> surgery to fix it (plumbing predicateinfo through CVP or LVI).
>>
>> Note: I did not make this number configurable because i'm not sure
>> anyone really needs to tweak this knob.  We run CVP 3 times. On the
>> testcases i have the slow ones happen in the middle, where CVP is
>> doing cleanup work other things are effective at.  Over the course of
>> 3 runs, we don't see to have any real loss of performance.
> I have a very different pass pipeline than you do.  Please don't 
> forget that when tuning.  Over-fitting the current O2 pipeline is a 
> path we don't want to pursue.
>
> Did you explore other values for the cutoff?  In particular, if we 
> used 10,000 would your test case scale badly?
>>
>> I haven't gotten a minimized testcase yet, but just imagine in your
>> head a testcase where, going *up* the CFG, you have branches, one of
>> which leads 50000 blocks deep, and the other, to something where the
>> answer is overdefined immediately.  BFS would discover the overdefined
>> faster than DFS, but do more work to do so.  In practice, the right
>> answer is "once DFS discovers overdefined for a value, stop trying to
>> get more info about that value" (and so, DFS would normally cache the
>> overdefined results for every value it passed through in those 50k
>> blocks, and never do that work again. But it don't, because of the
>> naming problem)
>>
>> Reviewers: chandlerc, djasper
>>
>> Subscribers: llvm-commits
>>
>> Differential Revision: https://reviews.llvm.org/D29715
>>
>> Modified:
>>      llvm/trunk/lib/Analysis/LazyValueInfo.cpp
>>
>> Modified: llvm/trunk/lib/Analysis/LazyValueInfo.cpp
>> URL: 
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LazyValueInfo.cpp?rev=294463&r1=294462&r2=294463&view=diff
>> ============================================================================== 
>>
>> --- llvm/trunk/lib/Analysis/LazyValueInfo.cpp (original)
>> +++ llvm/trunk/lib/Analysis/LazyValueInfo.cpp Wed Feb  8 09:22:52 2017
>> @@ -39,6 +39,10 @@ using namespace PatternMatch;
>>     #define DEBUG_TYPE "lazy-value-info"
>>   +// This is the number of worklist items we will process to try to 
>> discover an
>> +// answer for a given value.
>> +static const unsigned MaxProcessedPerValue = 500;
>> +
>>   char LazyValueInfoWrapperPass::ID = 0;
>>   INITIALIZE_PASS_BEGIN(LazyValueInfoWrapperPass, "lazy-value-info",
>>                   "Lazy Value Information Analysis", false, true)
>> @@ -563,7 +567,7 @@ namespace {
>>       /// This stack holds the state of the value solver during a query.
>>       /// It basically emulates the callstack of the naive
>>       /// recursive value lookup process.
>> -    std::stack<std::pair<BasicBlock*, Value*> > BlockValueStack;
>> +    SmallVector<std::pair<BasicBlock*, Value*>, 8> BlockValueStack;
>>         /// Keeps track of which block-value pairs are in 
>> BlockValueStack.
>>       DenseSet<std::pair<BasicBlock*, Value*> > BlockValueSet;
>> @@ -576,7 +580,7 @@ namespace {
>>           DEBUG(dbgs() << "PUSH: " << *BV.second << " in " << 
>> BV.first->getName()
>>                      << "\n");
>> -      BlockValueStack.push(BV);
>> +      BlockValueStack.push_back(BV);
>>         return true;
>>       }
>>   @@ -646,24 +650,50 @@ namespace {
>>   } // end anonymous namespace
>>     void LazyValueInfoImpl::solve() {
>> +  SmallVector<std::pair<BasicBlock *, Value *>, 8> StartingStack(
>> +      BlockValueStack.begin(), BlockValueStack.end());
>> +
>> +  unsigned processedCount = 0;
>>     while (!BlockValueStack.empty()) {
>> -    std::pair<BasicBlock*, Value*> &e = BlockValueStack.top();
>> +    processedCount++;
>> +    // Abort if we have to process too many values to get a result 
>> for this one.
>> +    // Because of the design of the overdefined cache currently 
>> being per-block
>> +    // to avoid naming-related issues (IE it wants to try to give 
>> different
>> +    // results for the same name in different blocks), overdefined 
>> results don't
>> +    // get cached globally, which in turn means we will often try to 
>> rediscover
>> +    // the same overdefined result again and again.  Once something 
>> like
>> +    // PredicateInfo is used in LVI or CVP, we should be able to 
>> make the
>> +    // overdefined cache global, and remove this throttle.
>> +    if (processedCount > MaxProcessedPerValue) {
>> +      DEBUG(dbgs() << "Giving up on stack because we are getting too 
>> deep\n");
>> +      // Fill in the original values
>> +      while (!StartingStack.empty()) {
>> +        std::pair<BasicBlock *, Value *> &e = StartingStack.back();
>> +        TheCache.insertResult(e.second, e.first,
>> +                              LVILatticeVal::getOverdefined());
>> +        StartingStack.pop_back();
>> +      }
>> +      BlockValueSet.clear();
>> +      BlockValueStack.clear();
>> +      return;
>> +    }
>> +    std::pair<BasicBlock *, Value *> &e = BlockValueStack.back();
>>       assert(BlockValueSet.count(e) && "Stack value should be in 
>> BlockValueSet!");
>>         if (solveBlockValue(e.second, e.first)) {
>>         // The work item was completely processed.
>> -      assert(BlockValueStack.top() == e && "Nothing should have been 
>> pushed!");
>> +      assert(BlockValueStack.back() == e && "Nothing should have 
>> been pushed!");
>>         assert(TheCache.hasCachedValueInfo(e.second, e.first) &&
>>                "Result should be in cache!");
>>           DEBUG(dbgs() << "POP " << *e.second << " in " << 
>> e.first->getName()
>>                      << " = " << 
>> TheCache.getCachedValueInfo(e.second, e.first) << "\n");
>>   -      BlockValueStack.pop();
>> +      BlockValueStack.pop_back();
>>         BlockValueSet.erase(e);
>>       } else {
>>         // More work needs to be done before revisiting.
>> -      assert(BlockValueStack.top() != e && "Stack should have been 
>> pushed!");
>> +      assert(BlockValueStack.back() != e && "Stack should have been 
>> pushed!");
>>       }
>>     }
>>   }
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits