[llvm-commits] [patch] Use the new edge dominance functions in GVN

Tue Aug 14 22:35:34 PDT 2012

On Wed, Aug 15, 2012 at 1:10 AM, Chris Lattner <clattner at apple.com> wrote:
> On Aug 13, 2012, at 11:38 PM, Daniel Berlin <dannyb at google.com> wrote:
>>>> In any case. I have attached both the previous patch and a variation
>>>> that uses critical edge splitting instead. If there is a preference
>>>> for the edge splitting one I can try to figure out the two remaining
>>>> test failures.
>>>
>>> We don't want to add additional users of the BreakCriticalEdges pass;
>>> it's bad for performance.
>>
>> In what exact way?
>
> Eli is right.  While the pass is convenient, it is really bad for compile time performance.
>
>> For example, current LLVM GVN notes critical edges blocking PRE, then
>> splits them, then  *reiterates all of its PRE* just to deal with a
>> critical edge.
>
> Ok, well that's also insane. :-)

Fair, but IMHO, that's the typical solution that gets taken when folks
take algorithms not meant to work on CFG's critical edges, and
implement them in a situation where they aren't allowed to break
critical edges.

>
>> Additionally, to my recollection, in the history of GCC, critical edge
>> splitting, even on huge functions, has never shown up as any real
>> performance hit (IE > 0.1% of the total profile).
>>
>> Is there some mailing list discussion or performance data you can
>> point me at that shows why it is such a problem?
>
> Here is some intuition: splitting a critical edge in LLVM requires:
>
> 1) a new BasicBlock
> 2) a new unconditional branch
> 3) updating any phi nodes in the successor
> 4) various use-list manipulations

Sure.
But if you do it in the middle of pass computations, you still get
that same cost, but now add

5. Updating whatever local datastructures exist (like availability and
leaders) that are auxillary info.
6. Updating analysis passes (like memdep) that do basic block caching
or paying for slow queries.

You hope is that you only do this for a few edges, instead of all, but
that depends on how good your other cleanup passes are and what
possibilities are still left.
Otherwise, the cost of 5 + 6 alone usually greatly outweighs 1-4 in practice.

>
> This isn't *hugely* expensive, but it isn't free either.  This is particularly bad because we also have SimplifyCFG, whose job it is to return the CFG back to canonical form, thus *undoing* this.  The worst case then is that you have a pass that requires critical edges to be broken, but that rarely actually kicks in to do something.  Then you effectively get:
>
> BreakCriticalEdges
> YourPass
> SimplifyCFG

I'm happy to agree that this is a crappy situation.  I'm also happy to
admit that passes that really care about critical edges in any serious
way are few and far between (the canonical example being PRE).

But we shouldn't simply say "we should never split all critical edges
before pass because it's compile time expensive" when it could be less
expensive than "split all critical edges before doing complicated
analysis and caching that can't be updated in the middle".

IOW, if we have data that shows it's cheaper, great, but we shouldn't
simply *assume* it's cheaper and that it's never a better thing to do.

I'd even be fine with some middle ground, like splitting all critical
edges once you analyze and discover there could be PRE candidates that
we'd want to insert there (IE some chance of success).

>
> and since YourPass "usually" doesn't actually do anything (maybe it is something like loop unswitch, which only kicks in on specific CFG structures), we end up burning compile time breaking *all* critical edges, and then throwing that work away by reunifying them away.
>
> There is another *major* problem with critical edge splitting, at least in the textbook definition that BreakCriticalEdges implements: consider a switch statement where multiple destinations go to the same block.  This is very common:
>
> switch (c) {
> case 1:
> case 24:
> case 37:
>    goto bb1
> case ...
>
> bb1:
>    ; preds = switchblock, switchblock, switchblock
>
>
> Guess what - we've got three critical edges right there from the switch to bb1... and splitting these into three *separate* paths is both incredibly
> inefficient (because many switches have a ton of cases) and brutally terrible to runtime performance if anything actually gets put into one of those > blocks.  This is the idea behind the "split critical edges 'nicely'" APIs that some passes use.

Currently, GVN's PRE is going to do this, but it's going to do it one
edge at a time, and re-evaluate all the possibilities on each
iteration.
>
>
> The upshot of this is that it is *far* better to have the passes that want critical edges broken to do it themselves, when there is actually a reason to > do it.  In the case of GVN, the "analysis" part of the algorithm should be updated to work on a pseudo-CFG that looks as-if the edges were broken, > and then actually do it when there is a reason to do so.

That seems fairly expensive to construct, since the initial analysis
going to want to know edge reachability/evaluate terminators, and
thus, it sounds at least as expensive as splitting them (plus the cost
of maintaining the pseudo-structures).  If it was simple dataflow, i'd
agree.

Thankfully, right now, the analysis mostly doesn't care, and you may
be able to get away with updating some structures.

That said, again, it is tricky to do this sparsely, without iterating
repeatedly on split edges.
(You can do it with the standard PRE dataflow equations, but bitvector
dataflow, meh).

I do have numbers to show that in at least two compilers, the
pre-splitting approach was faster than the iteration approach, and the
pseudo-cfg approach.