[llvm-dev] Adding Extended-SSA to LLVM

Sun Feb 5 13:38:43 PST 2017

(and D29519 was updated to not insert in and/or on the "wrong" edges, and
tests added to ensure it happens)

On Sun, Feb 5, 2017 at 1:25 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

>
>
> On Sun, Feb 5, 2017 at 12:25 PM, Nuno Lopes <nunoplopes at sapo.pt> wrote:
>
>> Hi Daniel,
>>
>> Many thanks for working on this!
>> SSI/e-SSA is the only way I'm aware of for doing efficient sparse
>> analyses, so I'm definitely in favor of adding support for it in LLVM!
>>
>> I read the discussion so far and did a cursory review of the patches, and
>> I have just a few questions:
>> - Is your plan to keep e-SSA form all the time (do it once and maintain
>> it throughout the pipeline), or do it in the passes that need it and then
>> get of it at end of the pass?
>
>
> At the moment, get rid of it.
>
>
>> My concern (similar to Sanjoy's) is the increased overhead for matching
>> patterns and so on.
>
>
> So far i have measured exactly none, FWIW.
>
> I'm also not worried because given how much we try to match tons of
> useless variants, i'm positive i could make it net zero compile time no
> matter what happened by removing wasted matching time :)
>
>
>
>> For example, how would instcombine work?
>
> Assuming you wished to use it there,  the intrinsic already has the
> returned attribute said, which states, specifically, that it always returns
> its first argument.
>
> If instcombine doesn't take advantage, it already has a problem with
> intrinsics marked with the returned attribute  :)
> (though yeah, they currently only exist in backends)
>
>
> As for how you would make it work, there is no magic, of course
> We either change the matchers to see through it or we don't.
> Both are valid options, with their own tradeoffs.  Nobody has yet
> approached me about adding it to instcombine, so i haven't tried to
> formulate an opinion which way i'd go :)
>
>
> Note that other intrinsics, such as noalias, have the same issue.
>
>
> Would we change the m_*() matchers to know how to see through the
>> intrinsic?
>>
>> - What you've implemented never creates phi nodes, right? (as opposed to
>> ABCD paper)
>>
>
> Correct.
> It originally did, but it's not possible to sanely create phi nodes for
> assumes in all cases.
>
> At least it seems that the current algorithm seems to bail out if the
>> successor has multiple predecessors.
>
>
> This is because it's not possible to place info in such cases without phi
> nodes, and even then it's not clear that it makes sense.
>
> Also note that such cases are  all critical edges (otherwise i can't see
> how they have info to propagate :P), and if you break the critical edges,
> it works just fine.
>
> The need to split critical edges is pretty much true for maximal results
> for any optimization.
>
>
>
>> This might be an ok abstraction for GVN, but for things like CVP it's
>> probably not. CVP merges information incoming from multiple edges (as any
>> other fancier abstractions we may want to have in the future will).
>>
>
>
> It's important to note: we sort phi node uses into the predecessor block
> they belong to, so that restriction does *not* apply to the typical phi
> node use case.
>
>
> IE given:
> define i32 @test12(i32 %x) {
>
>    %cmp = icmp eq i32 %x, 0
>    br i1 %cmp, label %cond_true, label %cond_false
>
>  cond_true:
>    br label %ret
>
>  cond_false:
>    br label %ret
>
>  ret:
>    %res = phi i32 [ %x, %cond_true ], [ %x, %cond_false ]
>    ret i32 %res
>  }
>
>
> You will get:
>  ; CHECK-LABEL: @test12(
>  ; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[X:%.*]], 0
>  ; CHECK-NEXT:    br i1 [[CMP]], label [[COND_TRUE:%.*]], label
> [[COND_FALSE:%.*]]
>  ; CHECK:       cond_true:
>  ; CHECK-NEXT:    [[X_0:%.*]] = call i32 @llvm.ssa.copy.i32(i32 [[X]])
>  ; CHECK-NEXT:    br label [[RET:%.*]]
>  ; CHECK:       cond_false:
>  ; CHECK-NEXT:    [[X_1:%.*]] = call i32 @llvm.ssa.copy.i32(i32 [[X]])
>  ; CHECK-NEXT:    br label [[RET]]
>  ; CHECK:       ret:
>  ; CHECK-NEXT:    [[RES:%.*]] = phi i32 [ [[X_0]], [[COND_TRUE]] ], [
> [[X_1]], [[COND_FALSE]] ]
>  ; CHECK-NEXT:    ret i32 [[RES]]
>
> (as you can see, we test this)
>
> So the cases i think you are thinking about, are not problems except in
> one degenerate case, which is where critical edges lead directly to the use.
>
> In such cases, note that the merges it performs can be proved to miss
> information or be non-optimal in the case of critical edges, even as it
> tries now.
>
> All that said ...
>
>
>
>> Any plans to support this?  (i.e., is the current algorithm "easily"
>> upgradable to support this scenario?)
>>
>
> It is easy to do, but it is ... messy, IMHO.
>
> Note that such a thing is generally a mess.
>
> Here is the degenerate case:
>
>
> As an example:
>  define i32 @f1(i32 %x) {
>  bb0:
>    %cmp = icmp eq i32 %x, 0
>    br i1 %cmp, label %bb2, label %bb1
>  bb1:
>    br label %bb2
>  bb2:
>    %cond = phi i32 [ %x, %bb0 ], [ %x, %bb1 ]
>    %foo = add i32 %cond, %x
>    ret i32 %foo
>
>  }
>
> the critical edge from bb0 to bb2 causes us to have no place to place
> predicateinfo in the current algorithm for the true branch (we will
> placefalse info).
>
> You could also always place it before each branch its for (because that
> block should dominate all uses in the conditional edges already) .
>
> The actual placement is irrelevant, BTW. In the renaming algorithm, by the
> time it goes to place them, it already knows what it *should* dominate.
>
> The only thing it gets "wrong" is the phi uses, because they are placed in
> the predecessor blocks right now as we sort.
>
> But because  we see them there, we can detect this case and change
> placement.
> Because they are guaranteed to be at the beginning of the successor block,
> you are guaranteed that, if you want to insert, the only thing that can be
> between them and the def you want to insert there is other phi uses in the
> same boat.
> So you can lookahead in the stream, find that def, insert it, use it, and
> pretend everything's great (without even pushing it onto the stack!)
>
> This is tricky in a one-pass algorithm, as it's really changing a simple
> renaming automaton into something that has two modes "critical phi use" and
> "regular" mode. In critical phi use mode, it finds the def, inserts it, and
> keeps processing until it hits a non-phi use. Then it shifts back into
> regular mode.
> But there's also only exactly one case all of the above work affects:
>
> The case where the phi node with the use is a direct successor of the
> branch, such that the edge to that use is critical.
>
> In any case,  this is also precisely the problem that splitting critical
> edges resolves, and if you use break-crit-edgse on the above, you get right
> answers all the time :)
>
> Note also that the thing that we replace, propagateEquality in GVN, also
> restricts itself to single predecessors, and simply attempts to propagate
> directly into critical  phi node uses to avoid the issue (which we can't do
> since NewGVN is an analysis followed by an eliminator).
>
> So yes, we can fix this case if we need to.
>
>
>
>>
>> - For a function/intrinsic marked as "returned(1) readnone", the obvious
>> optimization to do would be to RAUW the return with the argument (and kill
>> the function call altogether -- assuming readnone allows that; but I lost
>> track of that discussion).  Does instcombine do that?
>
> I understand that piggybacking on this existing feature is a good thing,
>> but I'm just wondering if, say, InstCombine won't simply revert e-SSA?
>>
>
> NewGVN is the only pass that uses it ATM, and it destroys the form.
> The current expectation is that anything that uses it will destroy it.
> This is the deliberate outcome of this discussion right now :)
>
> If it becomes slow enough that we want to keep it up to date, IMHO, we can
> burn that bridge when we come to it.
>
> At the point at which we are trying to keep predicateinfo up to date in a
> large number of places, i'd argue we should just move to e-ssa/ssi as the
> default and be done with it.  Since that's what you'll have effectively
> done anyway.
>
> I think it would be reasonable to see where it gets used, and if enough
> places, make a decision what to do.
>
>>
>> - In processBranch/processAssume you also consider branches on ANDs/ORs
>> of comparisons, and then each comparison is processed individually.
>> However, this seems incorrect for one of the branches:
>> if (a && b) {
>> ...
>> } else {
>>  // holds: !a OR !b
>>  use(a)
>>  use(b)
>> }
>>
>> Right now it seems that the information that is attached to the else
>> branch is !a AND !b, which would be incorrect.
>
>
> I could be pedantic and say the only information attached is a comparison,
> the branch, and whether the edge is the true edge or the false edge :)
> Which is correct.
>
> It also chains the info to give you the entire string of comparisons that
> were applied.
>
> However, in practice you are right, and clients are not likely to get this
> right with what i've given them.
>
> Since in practice, the only useful derived info is in the true branch of
> and, and the false branch of or, i'm going to make it not attach info to
> the other branch.
>
> Unless you can see a case it makes sense to?
>
>
>
>> I haven't seen the client of this analysis, so this is just speculation,
>> but the interface doesn't seem to have any indication for this case.  Or
>> maybe I'm misreading the code :)
>>
>
> No, you are correct. It just happens that the only client is smart enough
> to do something sane (i think :P)
>
>
>>
>> - Now slightly unrelated: do you know of other representations capable of
>> handling relational analyses?  For example, with e-SSA:
>> if (x+y > 0) {
>>  // predicate info attached to 'x+y' only
>
>  use(x)
>> }
>
>
> We could do that with this pass, actually.
>
> It's a question of where you terminate the operand-finding.
>
> You could actually just make it DFS the comparison operation and collect
> operands until you hit all the leaves, and insert for those.This would be a
> simple modification to collectCmpOperands.
> (it will still be smart enough to only insert if there are actual uses
> affected by info).
>
> We don't do that ATM, but there isn't anything i can see that would stop
> you.
>
> The goals i set were to replace propagateEquality in GVN with NewGVN, so
> we pretty much only produce info that we can directly propagate.
>
> It all depends on how many names you want to generate.
>
> I believe the study i saw, which tried splitting at different places, came
> to the conclusion that the best "bang for buck" was e-ssa.
>
> IE the percent more you get from more was not worth the cost.
>
> But when i spoke with them about it, their biggest time-waste  was
> actually "we insert a lot of useless phis for operations that never get
> used, because computing liveness is hard".  But as i've shown here, you
> don't even need to explicitly compute it to solve that problem. So who
> knows, maybe it does pay to split everywhere once that cost is eliminated.
>
> I'd put this one in the  "evaluation necessary".
>
>
>
>>
>>
> So at use(x) no information will be available, even if we know that FWIW
>> 'x+y > 0' holds there.  I don't think LLVM has any analysis that can track
>> these kind of symbolic relations, but I was just curious if there's
>> anything out there that can handle such analyses?
>>
>
>
> The folks i know who do this, in fact start with e-ssa and go from there,
> with the caveats i listed :)
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170205/0971eddd/attachment.html>