[llvm-dev] Adding Extended-SSA to LLVM
Daniel Berlin via llvm-dev
llvm-dev at lists.llvm.org
Thu Feb 2 17:01:45 PST 2017
Now with even the right llvm-dev :)
On Thu, Feb 2, 2017 at 4:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>
> On Thu, Feb 2, 2017 at 3:52 PM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
>
>> On Wed, Feb 1, 2017 at 8:33 PM Daniel Berlin <dberlin at dberlin.org> wrote:
>>
>>> Hey folks,
>>>
>>> After a long amount of discussion both offline and on, I put a
>>> pass/intrinsic to add extended SSA up at http://reviews.llvm.org/D29316.
>>>
>>> Sean asked me to share it more broadly on llvm-dev, so here you go :)
>>>
>>> For those not familiar with extended-SSA, it's described in the paper
>>> "ABCD: Eliminating Array Bounds Checks on Demand".
>>>
>>> There is a very large amount of explanation in the summary of the review
>>> that i don't want to paste here , but the short version is:
>>>
>>> 1. We make copies of variables where they are used in comparisons that
>>> lead to assumes or branches and it's possible to determine something about
>>> their value from the branch.
>>>
>>> IE
>>> define i32 @test1(i32 %x) {
>>> %cmp = icmp eq i32 %x, 50
>>> br i1 %cmp, label %true, label %false
>>> true:
>>> ret i32 %x
>>> false:
>>> ret i32 1
>>> }
>>>
>>> becomes
>>>
>>> define i32 @test1(i32 %x) {
>>> %cmp = icmp eq i32 %x, 50
>>> br i1 %cmp, label %true, label %false
>>>
>>> true: ; preds = %0
>>> ; Has predicate info
>>> ; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32
>>> %x, 50 }
>>> %x.0 = call i32 @llvm.predicateinfo.i32(i32 %x)
>>> ret i32 %x.0
>>>
>>> false: ; preds = %0
>>> ret i32 1
>>> }
>>>
>>> All uses that are dominated by the predicate info are renamed to use it.
>>>
>>> 2. We do so very quickly (it takes about 600ms to do 2 million blocks
>>> with comparisons, by comparison, most passes simply crash or take forever
>>> on the same file. As an example, GVN takes 500 seconds), and only insert
>>> when and where the operands are used.
>>>
>>> 3. The intrinsics are marked so they do not affect optimization (and
>>> currently, passes that use them all destroy them). They also can detect
>>> when they've been moved to make sure they are still valid if need me.
>>>
>>> 4. They could be updated pretty easily if we wanted
>>>
>>> With one real downside:
>>>
>>> 5. We modify the IR in an analysis to do so :(
>>>
>>
>> One question that occurred to me reading this, that probably is just my
>> ignorance, but maybe you can help =]
>>
>> Any particular reason to not model this as essentially a shadow of the IR?
>>
>
> Done already :) Checkout the newgvn-predicateinfo-uses branch on github :P.
>
> The short version there's no point is because it's complex, and you can't
> update it *anyway* at that point :P
>
>
>
>> I'm imagining creating a map from operand -> predinfo, where predinfo
>> essentially models these inserted intrinsics, and rather than RAUW-ing
>> dominated uses, you insert a mapping for dominated uses.
>>
>
> Ah, if you could easily map from operand to predinfo, this might even work
> okayish (you now have O(number of uses in the entire program) lookups
> you've added to newgvn, to check if they have predicateinfo, but okay).
>
> But to start:
> You can't do this easily
> The operands you need to attach to are Use's.
>
> Because
>
> if (a == 0)
> ret a
> else
> ret a
>
> each a has different info.
>
> The predicateinfo insertion accomplishes this by physical renaming, so the
> values have different name, and every use of a given predicateinfo has the
> same value.
>
> But in a virtual form, it looks like this:
>
> IE this looks like:
> if (a == 0)
> 1 = PredicateDef(branch, blah blah blah)
> PredicateUse(1), but really attached to *this use* of a, not to ret
> ret a
> else
> 2 = PredicateDef(branch, blah blah blah)
> PredicateUse(2), but really attached to *this use* of a, not to ret
> ret a
>
>
> (even if you really wanted to, you have to attach to uses because ách
> operand of an instruction could be a use of different predicateinfo)
>
> You can in fact, attach to uses.
> You can even do something like the above (IE have predicate use which
> stores a regular use), so you know what's up, and have a real def/use form.
>
> it goes downhill from there, though, precisely because the info is
> attached to Use's :(
>
> First, we have two types of things that would use this pass.
>
> Lazy things, and propagating things.
>
> Lazy things, that return a value for a thing, all at once, in a single
> call (LVI if it had no caches) , might be convertible to use this.
>
> You have the problem that getting a use from an instruction operand is not
> currently trivial (we'd have to have a getOperandUse that mirrors
> getOperand or something), but not too bad.
>
> But all of ours really cache stuff, so they are really lazy propagating
> things.
>
>
> and Propagating things (CVP, SCCP, GVN, etc) , which are the rest of them,
> are hard.
>
> They want to store tables of info by Value.
>
> But we need to store per-use info.
>
> If you modify them to store per-use info for all uses, you take a 5-15x
> memory and time hit, to store info about each use, where most have the same
> info.
>
> Any elimination you do also has to look up per-use info, not per-value
> info like we do now, which means the elimination becomes (Defs/Uses) times
> slower too :(
>
> This pretty much puts it out of the range of sanity.
> But let's say you are smarter than the average bear. You want to build a
> hybrid scheme, where you only store special use info for the things that
> are special.
> You realize you can make your virtual form, above, Value *'s. Then you
> have the name you need because the PredicateDef is a name you can use like
> a normal Instruction def.
>
> But you still have to convert back and forth between Use's and Value's
> everywhere, and plumb Use& through anything that does lookups, so you can
> get a Use& and turn it into the right thing above.
> Then, replacement of the original operands, to make it not have to do
> lookups on every use in the program, requires storing the original uses in
> the PredicateUse class, so you know which uses in the original IR
> correspond to which uses in the Predicate virtual IR.
>
> This is a *lot* of plumbing. and because Use's have an operator Value*,
> the liklihood you create a lot of bugs is high, because if you use a use
> where you meant use.getUser(), it still works :(
>
> You are creative though, and get this all working.
>
> That gets you value numbering (it's a lot harder in a lot of other cases),
> without even needing an O(uses in the program) hash table.
>
> Now you have to try eliminate stuff.
>
> In everything but NewGVN, you have to rewrite the eliminators. They all
> replace as they go, which means either double the replace cost (since you
> have to replace the original uses, and for predicates, the original uses
> and the uses in the predicate uses), or even more fun as you try to update
> utilities to understand about looking up uses :(
>
>
> In NewGVN, we eliminate in O(uses that have the same value) time, by
> sorting congruence class instruction defs and uses by global and local
> dominator numbers, and using a stack.
> So we now need to order the predicate defs, uses, and regular defs and
> uses all in the right order (and replace the right defs with predicate defs
> so the lookup happens right) in dominator order to know which replaces
> which.
> But you can't use dominates() to do this, because it doesn't understand
> these,etc.
>
> So you have to create a local instruction numbering, and figure out, where
> in them each thing goes.
> Which you can do, at some non-trivial cost over regular instruction
> numbering
>
> This is just the simple parts, mind you.
> After all this, you have something that takes pretty much double the
> memory, a lot harder to understand, *and* it still gets invalidated easily
> :)
>
> But it's immutable!
>
> There's kind of no upside, and lots of downsides, both in building it, and
> trying to use it:)
>
>
>
>> This would cause it to be a pure analysis again. It would make it
>> relatively fragile of course -- it would be easily invalidated. But given
>> how fast it is to compute, for several of the use cases it seems like it
>> would be sufficient (NewGVN seems like it could tolerate computing this
>> each time, etc.
>>
>> Anyways, I'm sure you've looked at something like this, and I'm curious
>> what goes wrong with it.
>>
>>
>
>
>>
>>> The main user at the moment is NewGVN, which uses it to do the
>>> equivalent of GVN's propagateEquality. I'd be happy to make it a utility
>>> called from NewGVN instead of an analysis. A number of folks online and off
>>> asked me to make it usable for others, so i did that :) Some folks want to
>>> use it to cleanup existing passes (EarlyCSE, etc), some want to use it in
>>> new passes.
>>>
>>
>> So, this definitely sounds useful, but I think that while it mutates the
>> IR itself, I think it should just be a utility routine that passes call at
>> their start to put the IR into the form they need. It'll be a bit tricky if
>> you want analysis passes to observe this form -- you may not be able to use
>> the caching infrastructure of the new PM at all for such analyses and may
>> need to express them as stand alone utilities as well.
>>
>
> Yeah, i'm just going to make it a utility
>
>>
>>
> If it is reasonable to do so, I'd also suggest a cleanup utility that
>> ensures the mutated IR doesn't escape the end of the pass either to avoid
>> having to teach the rest of the optimizer about it. But if there are
>> reasons this doesn't work well, maybe we can use actual PHI nodes for this
>> to simplify what we have to teach the rest of the optimizer about?
>>
>
> We can use single argument phi nodes for branches, but it gets tricky with
> assume, because you need to place the phi at the beginning of the assume
> block, and you can't do this if the operands are defined in the same block,
> so you'd have to place n ones with the same info in every successor
> block, and hope the assume didn't do anything useful in the rest of the
> assume block or something :(
>
> I'll also write a verifier.
>
>
>> -Chandler
>>
>> PS: In case folks are wondering, the reason *why* I think this needs not
>> be an analysis is because in the new PM we rely heavily on caching of
>> analysis results. This caching behavior means that when things are run
>> isn't very clear and we'd like to avoid ordering dependencies between two
>> analyses running over the same IR to make things more debuggable later on.
>> I can dive into more details here as useful of course.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170202/441373f1/attachment-0001.html>
More information about the llvm-dev
mailing list