[llvm] r294463 - LVI: Add a per-value worklist limit to LazyValueInfo.

Thu Feb 16 03:44:08 PST 2017

On Wed, Feb 15, 2017 at 3:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

> Sorry, i normally only scan llvm-commits mail occasionally (gotta reod my
> filtering), i didn't notice this until joerg pointed it out to me.
>
> On Wed, Feb 8, 2017 at 9:29 PM, Philip Reames <listmail at philipreames.com>
> wrote:
>
>> Danny,
>>
>> I feel like there's something missing here.  In particular, I'm not sure
>> I agree with your description of the problem.  See inline comments below.
>> Can you help me understand why this is needed?
>
> Sure.
> Happy to also provide a testcase if it helps.
>
>
>>
>>
>> On 02/08/2017 07:22 AM, Daniel Berlin via llvm-commits wrote:
>>
>>> Author: dannyb
>>> Date: Wed Feb  8 09:22:52 2017
>>> New Revision: 294463
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=294463&view=rev
>>> Log:
>>> LVI: Add a per-value worklist limit to LazyValueInfo.
>>>
>>> Summary:
>>> LVI is now depth first, which is optimal for iteration strategy in
>>> terms of work per call.  However, the way the results get cached means
>>> it can still go very badly N^2 or worse right now.  The overdefined
>>> cache is per-block, because LVI wants to try to get different results
>>> for the same name in different blocks (IE solve the problem
>>> PredicateInfo solves).  This means even if we discover a value is
>>> overdefined after going very deep, it doesn't cache this information,
>>> causing it to end up trying to rediscover it again and again.  The
>>> same is true for values along the way.
>>>
>> This doesn't parse for me.  We have the OverdefinedCache and do cache the
>> result of a (Value, BB) pair being overdefined.  Other than the fact the
>> cache is split - which admitted is confusing - where's the problem here?
>
> Given
> BB 1
> BB 2
> BB 3
> BB 4
>
> Overdefined in one is not equal to overdefined in all.
> So we cache in one bb
> then ask about the same value in the next bb.
> So it keeps rediscovering the overdefinedness of it.
> 16000 times per value.
>
>
It's worse than this, btw.

Imagine i have the following horrible code:

entry:
tmp1 = overdefined
tmp2 = overdefined
.....
tmp16000 = overdefined

bb0:
nothing
br label bb1
....
bb16000:
use tmp1
use tmp2
...
use tmp16000
br label 16001
bb16001:
nothing
br label 16002
....
bb32000:
use tmp1
use tmp2
...
use tmp16000

Let's look at the work between the two schemes:

In the per-block overdefined scheme, with no predicateinfo

For the use of tmp1 in bb16000, it will walk 16000 blocks back to the entry
block, discover overdefined. Assume it properly caches all 16000 blocks (I
actually believe it does not in some cases, but let's call that a bug and
ignore it. it doesn't actually change our work done in this case).
We have spent 16k blocks discovering overdefinedness here.
For the use of tmp2..tmp16000 in bb16000, ditto
We do this to see if it has any conditions, etc, that give us info.

Let's go to bb32000
Again, let's assume it's all cached per-bb properly.
For the use of tmp1 in bb32000, it will walk 16000 blocks back to bb16000,
discover overdefined. Assume it properly caches all 16000 new blocks
We have spent 16k blocks discovering overdefinedness here for each variable
in bb32000.

In the predicateinfo scheme where overdefined is global, and uses are
renamed where value could change:

For the use of tmp1 in bb16000, we see where the def is. It's in block bb1.
Because we know the use would have been renamed if the value could have
changed due to conditions, etc, we can skip all the blocks in the middle.
They are irrelevant.   We look at bb1 (and use any predicateinfo to check
conditions), see it's overdefined.
We have spent 1 block discovering overdefinedness here.
For the use of tmp2..tmp16000 in bb16000, ditto

Let's go to bb32000
Remember that overdefined is now global, because it would have been renamed
if it was dominated by something that could change the value.
For the use of tmp1 in bb16000, it will walk 0 blocks. The value is
overdefined for sure.
Ditto for tmp2..tmp16000

You can make the testcase worse for the current scheme by adding more
splits.

IE imagine the above example, except two chains of 16000 blocks that get
merged in bb 32000.

In the predicateinfo scheme, the only extra time we spend is what it takes
to look at the predicateinfo, if any (if it isn'tlive in those blocks, and
it isn't above, we don't create any) to see if it tells us anything.
In the current overdefined scheme, depending on pred ordering, even if we
have cached one of the chains of 16k blocks to overdefined, we may look at
the other chain first.

(again, we could hack a heuristic into the current one to follow a branch
where it is dominated by some cached info, and we estimate the block
distance between us and the cached info to be Y, but this gets complicated
quickly).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170216/a008049d/attachment.html>