[llvm] r294463 - LVI: Add a per-value worklist limit to LazyValueInfo.

Wed Feb 15 15:59:10 PST 2017

Sorry, i normally only scan llvm-commits mail occasionally (gotta reod my
filtering), i didn't notice this until joerg pointed it out to me.

On Wed, Feb 8, 2017 at 9:29 PM, Philip Reames <listmail at philipreames.com>
wrote:

> Danny,
>
> I feel like there's something missing here.  In particular, I'm not sure I
> agree with your description of the problem.  See inline comments below.
> Can you help me understand why this is needed?

Sure.
Happy to also provide a testcase if it helps.

>
>
> On 02/08/2017 07:22 AM, Daniel Berlin via llvm-commits wrote:
>
>> Author: dannyb
>> Date: Wed Feb  8 09:22:52 2017
>> New Revision: 294463
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=294463&view=rev
>> Log:
>> LVI: Add a per-value worklist limit to LazyValueInfo.
>>
>> Summary:
>> LVI is now depth first, which is optimal for iteration strategy in
>> terms of work per call.  However, the way the results get cached means
>> it can still go very badly N^2 or worse right now.  The overdefined
>> cache is per-block, because LVI wants to try to get different results
>> for the same name in different blocks (IE solve the problem
>> PredicateInfo solves).  This means even if we discover a value is
>> overdefined after going very deep, it doesn't cache this information,
>> causing it to end up trying to rediscover it again and again.  The
>> same is true for values along the way.
>>
> This doesn't parse for me.  We have the OverdefinedCache and do cache the
> result of a (Value, BB) pair being overdefined.  Other than the fact the
> cache is split - which admitted is confusing - where's the problem here?

Given
BB 1
BB 2
BB 3
BB 4

Overdefined in one is not equal to overdefined in all.
So we cache in one bb
then ask about the same value in the next bb.
So it keeps rediscovering the overdefinedness of it.
16000 times per value.

If the cache was made global to each name, this would fix it.
But it doesn't have a global cache precisely because of the naming issue.
IE it thinks it can get a better answer in BB 2 than BB1.
And it, in fact, can in blocks dominated byconditionals
That's the whole reason predicateinfo exists.
To change the name in these cases so that you can associate singular info
with it and still get good results.

Splitting this cache  is a very bad way to solve the fact that you may be
able to prove different values for different uses.
IMHO, better to rename those uses.

> In practice, overdefined
>> anywhere should mean overdefined everywhere (this is how, for example,
>> SCCP works).
>>
>> Until we get around to reworking the overdefined cache, we need to
>> limit the worklist size we process.  Note that permanently reverting
>> the DFS strategy exploration seems the wrong strategy (temporarily
>> seems fine if we really want).  BFS is clearly the wrong approach, it
>> just gets luckier on some testcases.  It's also very hard to design
>> an effective throttle for BFS. For DFS, the throttle is directly related
>> to the depth of the CFG.  So really deep CFGs will get cutoff, smaller
>> ones will not. As the CFG simplifies, you get better results.
>> In BFS, the limit is it's related to the fan-out times average block size,
>> which is harder to reason about or make good choices for.
>>
> I agree that a threshold for DFS makes much more sense than for BFS.  I
> think having the threshold is a good idea, though I'd still like to
> understand the test case.  I suspect we should be able to use a much higher
> threshold value.

Yes, we can.
I took the lowest point, where, on all of my testcases, it changed nothing,
and multiplied by 5 :)
I did some other testing, and the percentage of cases i found where we
searched through 500 more blocks than last time, and found a better answer,
was 0%.

Maybe you have much worse testcases.

>
>
>> Bug being filed about the overdefined cache, but it will require major
>> surgery to fix it (plumbing predicateinfo through CVP or LVI).
>>
>> Note: I did not make this number configurable because i'm not sure
>> anyone really needs to tweak this knob.  We run CVP 3 times. On the
>> testcases i have the slow ones happen in the middle, where CVP is
>> doing cleanup work other things are effective at.  Over the course of
>> 3 runs, we don't see to have any real loss of performance.
>>
> I have a very different pass pipeline than you do.  Please don't forget
> that when tuning.  Over-fitting the current O2 pipeline is a path we don't
> want to pursue.
>

Su

>
> Did you explore other values for the cutoff?  In particular, if we used
> 10,000 would your test case scale badly?

Yes.
The max value in this testcase is 16000 per-value, and there are thousands
of values it hits this on.

Conversely, i can't find a testcase where we dfs'd through 500 things and
still found a good answer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170215/fc97e602/attachment.html>