[llvm] r294463 - LVI: Add a per-value worklist limit to LazyValueInfo.

Wed Feb 15 16:05:39 PST 2017

(and note, you may be able to hack around this by making the overdefined
cache  an interval map from <dfs pair, value>, and only cache for split
points/whatever, but that seemed more work than starting to move it towards
predicateinfo)

On Wed, Feb 15, 2017 at 3:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

> Sorry, i normally only scan llvm-commits mail occasionally (gotta reod my
> filtering), i didn't notice this until joerg pointed it out to me.
>
> On Wed, Feb 8, 2017 at 9:29 PM, Philip Reames <listmail at philipreames.com>
> wrote:
>
>> Danny,
>>
>> I feel like there's something missing here.  In particular, I'm not sure
>> I agree with your description of the problem.  See inline comments below.
>> Can you help me understand why this is needed?
>
> Sure.
> Happy to also provide a testcase if it helps.
>
>
>>
>>
>> On 02/08/2017 07:22 AM, Daniel Berlin via llvm-commits wrote:
>>
>>> Author: dannyb
>>> Date: Wed Feb  8 09:22:52 2017
>>> New Revision: 294463
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=294463&view=rev
>>> Log:
>>> LVI: Add a per-value worklist limit to LazyValueInfo.
>>>
>>> Summary:
>>> LVI is now depth first, which is optimal for iteration strategy in
>>> terms of work per call.  However, the way the results get cached means
>>> it can still go very badly N^2 or worse right now.  The overdefined
>>> cache is per-block, because LVI wants to try to get different results
>>> for the same name in different blocks (IE solve the problem
>>> PredicateInfo solves).  This means even if we discover a value is
>>> overdefined after going very deep, it doesn't cache this information,
>>> causing it to end up trying to rediscover it again and again.  The
>>> same is true for values along the way.
>>>
>> This doesn't parse for me.  We have the OverdefinedCache and do cache the
>> result of a (Value, BB) pair being overdefined.  Other than the fact the
>> cache is split - which admitted is confusing - where's the problem here?
>
> Given
> BB 1
> BB 2
> BB 3
> BB 4
>
> Overdefined in one is not equal to overdefined in all.
> So we cache in one bb
> then ask about the same value in the next bb.
> So it keeps rediscovering the overdefinedness of it.
> 16000 times per value.
>
> If the cache was made global to each name, this would fix it.
> But it doesn't have a global cache precisely because of the naming issue.
> IE it thinks it can get a better answer in BB 2 than BB1.
> And it, in fact, can in blocks dominated byconditionals
> That's the whole reason predicateinfo exists.
> To change the name in these cases so that you can associate singular info
> with it and still get good results.
>
> Splitting this cache  is a very bad way to solve the fact that you may be
> able to prove different values for different uses.
> IMHO, better to rename those uses.
>
>
>
>> In practice, overdefined
>>> anywhere should mean overdefined everywhere (this is how, for example,
>>> SCCP works).
>>>
>>> Until we get around to reworking the overdefined cache, we need to
>>> limit the worklist size we process.  Note that permanently reverting
>>> the DFS strategy exploration seems the wrong strategy (temporarily
>>> seems fine if we really want).  BFS is clearly the wrong approach, it
>>> just gets luckier on some testcases.  It's also very hard to design
>>> an effective throttle for BFS. For DFS, the throttle is directly related
>>> to the depth of the CFG.  So really deep CFGs will get cutoff, smaller
>>> ones will not. As the CFG simplifies, you get better results.
>>> In BFS, the limit is it's related to the fan-out times average block
>>> size,
>>> which is harder to reason about or make good choices for.
>>>
>> I agree that a threshold for DFS makes much more sense than for BFS.  I
>> think having the threshold is a good idea, though I'd still like to
>> understand the test case.  I suspect we should be able to use a much higher
>> threshold value.
>
>
> Yes, we can.
> I took the lowest point, where, on all of my testcases, it changed
> nothing, and multiplied by 5 :)
> I did some other testing, and the percentage of cases i found where we
> searched through 500 more blocks than last time, and found a better answer,
> was 0%.
>
> Maybe you have much worse testcases.
>
>
>
>>
>>
>>> Bug being filed about the overdefined cache, but it will require major
>>> surgery to fix it (plumbing predicateinfo through CVP or LVI).
>>>
>>> Note: I did not make this number configurable because i'm not sure
>>> anyone really needs to tweak this knob.  We run CVP 3 times. On the
>>> testcases i have the slow ones happen in the middle, where CVP is
>>> doing cleanup work other things are effective at.  Over the course of
>>> 3 runs, we don't see to have any real loss of performance.
>>>
>> I have a very different pass pipeline than you do.  Please don't forget
>> that when tuning.  Over-fitting the current O2 pipeline is a path we don't
>> want to pursue.
>>
>
> Su
>
>>
>> Did you explore other values for the cutoff?  In particular, if we used
>> 10,000 would your test case scale badly?
>
>
> Yes.
> The max value in this testcase is 16000 per-value, and there are thousands
> of values it hits this on.
>
> Conversely, i can't find a testcase where we dfs'd through 500 things and
> still found a good answer.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170215/3293ac49/attachment.html>