[llvm] r294463 - LVI: Add a per-value worklist limit to LazyValueInfo.
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 8 21:29:33 PST 2017
Danny,
I feel like there's something missing here. In particular, I'm not sure
I agree with your description of the problem. See inline comments
below. Can you help me understand why this is needed?
On 02/08/2017 07:22 AM, Daniel Berlin via llvm-commits wrote:
> Author: dannyb
> Date: Wed Feb 8 09:22:52 2017
> New Revision: 294463
>
> URL: http://llvm.org/viewvc/llvm-project?rev=294463&view=rev
> Log:
> LVI: Add a per-value worklist limit to LazyValueInfo.
>
> Summary:
> LVI is now depth first, which is optimal for iteration strategy in
> terms of work per call. However, the way the results get cached means
> it can still go very badly N^2 or worse right now. The overdefined
> cache is per-block, because LVI wants to try to get different results
> for the same name in different blocks (IE solve the problem
> PredicateInfo solves). This means even if we discover a value is
> overdefined after going very deep, it doesn't cache this information,
> causing it to end up trying to rediscover it again and again. The
> same is true for values along the way.
This doesn't parse for me. We have the OverdefinedCache and do cache
the result of a (Value, BB) pair being overdefined. Other than the fact
the cache is split - which admitted is confusing - where's the problem here?
> In practice, overdefined
> anywhere should mean overdefined everywhere (this is how, for example,
> SCCP works).
>
> Until we get around to reworking the overdefined cache, we need to
> limit the worklist size we process. Note that permanently reverting
> the DFS strategy exploration seems the wrong strategy (temporarily
> seems fine if we really want). BFS is clearly the wrong approach, it
> just gets luckier on some testcases. It's also very hard to design
> an effective throttle for BFS. For DFS, the throttle is directly related
> to the depth of the CFG. So really deep CFGs will get cutoff, smaller
> ones will not. As the CFG simplifies, you get better results.
> In BFS, the limit is it's related to the fan-out times average block size,
> which is harder to reason about or make good choices for.
I agree that a threshold for DFS makes much more sense than for BFS. I
think having the threshold is a good idea, though I'd still like to
understand the test case. I suspect we should be able to use a much
higher threshold value.
>
> Bug being filed about the overdefined cache, but it will require major
> surgery to fix it (plumbing predicateinfo through CVP or LVI).
>
> Note: I did not make this number configurable because i'm not sure
> anyone really needs to tweak this knob. We run CVP 3 times. On the
> testcases i have the slow ones happen in the middle, where CVP is
> doing cleanup work other things are effective at. Over the course of
> 3 runs, we don't see to have any real loss of performance.
I have a very different pass pipeline than you do. Please don't forget
that when tuning. Over-fitting the current O2 pipeline is a path we
don't want to pursue.
Did you explore other values for the cutoff? In particular, if we used
10,000 would your test case scale badly?
>
> I haven't gotten a minimized testcase yet, but just imagine in your
> head a testcase where, going *up* the CFG, you have branches, one of
> which leads 50000 blocks deep, and the other, to something where the
> answer is overdefined immediately. BFS would discover the overdefined
> faster than DFS, but do more work to do so. In practice, the right
> answer is "once DFS discovers overdefined for a value, stop trying to
> get more info about that value" (and so, DFS would normally cache the
> overdefined results for every value it passed through in those 50k
> blocks, and never do that work again. But it don't, because of the
> naming problem)
>
> Reviewers: chandlerc, djasper
>
> Subscribers: llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D29715
>
> Modified:
> llvm/trunk/lib/Analysis/LazyValueInfo.cpp
>
> Modified: llvm/trunk/lib/Analysis/LazyValueInfo.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LazyValueInfo.cpp?rev=294463&r1=294462&r2=294463&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Analysis/LazyValueInfo.cpp (original)
> +++ llvm/trunk/lib/Analysis/LazyValueInfo.cpp Wed Feb 8 09:22:52 2017
> @@ -39,6 +39,10 @@ using namespace PatternMatch;
>
> #define DEBUG_TYPE "lazy-value-info"
>
> +// This is the number of worklist items we will process to try to discover an
> +// answer for a given value.
> +static const unsigned MaxProcessedPerValue = 500;
> +
> char LazyValueInfoWrapperPass::ID = 0;
> INITIALIZE_PASS_BEGIN(LazyValueInfoWrapperPass, "lazy-value-info",
> "Lazy Value Information Analysis", false, true)
> @@ -563,7 +567,7 @@ namespace {
> /// This stack holds the state of the value solver during a query.
> /// It basically emulates the callstack of the naive
> /// recursive value lookup process.
> - std::stack<std::pair<BasicBlock*, Value*> > BlockValueStack;
> + SmallVector<std::pair<BasicBlock*, Value*>, 8> BlockValueStack;
>
> /// Keeps track of which block-value pairs are in BlockValueStack.
> DenseSet<std::pair<BasicBlock*, Value*> > BlockValueSet;
> @@ -576,7 +580,7 @@ namespace {
>
> DEBUG(dbgs() << "PUSH: " << *BV.second << " in " << BV.first->getName()
> << "\n");
> - BlockValueStack.push(BV);
> + BlockValueStack.push_back(BV);
> return true;
> }
>
> @@ -646,24 +650,50 @@ namespace {
> } // end anonymous namespace
>
> void LazyValueInfoImpl::solve() {
> + SmallVector<std::pair<BasicBlock *, Value *>, 8> StartingStack(
> + BlockValueStack.begin(), BlockValueStack.end());
> +
> + unsigned processedCount = 0;
> while (!BlockValueStack.empty()) {
> - std::pair<BasicBlock*, Value*> &e = BlockValueStack.top();
> + processedCount++;
> + // Abort if we have to process too many values to get a result for this one.
> + // Because of the design of the overdefined cache currently being per-block
> + // to avoid naming-related issues (IE it wants to try to give different
> + // results for the same name in different blocks), overdefined results don't
> + // get cached globally, which in turn means we will often try to rediscover
> + // the same overdefined result again and again. Once something like
> + // PredicateInfo is used in LVI or CVP, we should be able to make the
> + // overdefined cache global, and remove this throttle.
> + if (processedCount > MaxProcessedPerValue) {
> + DEBUG(dbgs() << "Giving up on stack because we are getting too deep\n");
> + // Fill in the original values
> + while (!StartingStack.empty()) {
> + std::pair<BasicBlock *, Value *> &e = StartingStack.back();
> + TheCache.insertResult(e.second, e.first,
> + LVILatticeVal::getOverdefined());
> + StartingStack.pop_back();
> + }
> + BlockValueSet.clear();
> + BlockValueStack.clear();
> + return;
> + }
> + std::pair<BasicBlock *, Value *> &e = BlockValueStack.back();
> assert(BlockValueSet.count(e) && "Stack value should be in BlockValueSet!");
>
> if (solveBlockValue(e.second, e.first)) {
> // The work item was completely processed.
> - assert(BlockValueStack.top() == e && "Nothing should have been pushed!");
> + assert(BlockValueStack.back() == e && "Nothing should have been pushed!");
> assert(TheCache.hasCachedValueInfo(e.second, e.first) &&
> "Result should be in cache!");
>
> DEBUG(dbgs() << "POP " << *e.second << " in " << e.first->getName()
> << " = " << TheCache.getCachedValueInfo(e.second, e.first) << "\n");
>
> - BlockValueStack.pop();
> + BlockValueStack.pop_back();
> BlockValueSet.erase(e);
> } else {
> // More work needs to be done before revisiting.
> - assert(BlockValueStack.top() != e && "Stack should have been pushed!");
> + assert(BlockValueStack.back() != e && "Stack should have been pushed!");
> }
> }
> }
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list