[llvm-dev] Adding Extended-SSA to LLVM

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 3 19:28:16 PST 2017


On Thu, Feb 2, 2017 at 5:01 PM, Daniel Berlin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Now with even the right llvm-dev :)
>
>
> On Thu, Feb 2, 2017 at 4:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>>
>>
>> On Thu, Feb 2, 2017 at 3:52 PM, Chandler Carruth <chandlerc at gmail.com>
>> wrote:
>>
>>> On Wed, Feb 1, 2017 at 8:33 PM Daniel Berlin <dberlin at dberlin.org>
>>> wrote:
>>>
>>>> Hey folks,
>>>>
>>>> After a long amount of discussion both offline and on, I put a
>>>> pass/intrinsic to add extended SSA up at http://reviews.llvm.org/D29316
>>>> .
>>>>
>>>> Sean asked me to share it more broadly on llvm-dev, so here you go :)
>>>>
>>>> For those not familiar with extended-SSA, it's described in the paper
>>>> "ABCD: Eliminating Array Bounds Checks on Demand".
>>>>
>>>> There is a very large amount of explanation in the summary of the
>>>> review that i don't want to paste here , but the short version is:
>>>>
>>>> 1. We make copies of variables where they are used in comparisons that
>>>> lead to assumes or branches and it's possible to determine something about
>>>> their value from the branch.
>>>>
>>>> IE
>>>> define i32 @test1(i32 %x) {
>>>>     %cmp = icmp eq i32 %x, 50
>>>>     br i1 %cmp, label %true, label %false
>>>> true:
>>>>     ret i32 %x
>>>> false:
>>>>     ret i32 1
>>>> }
>>>>
>>>> becomes
>>>>
>>>> define i32 @test1(i32 %x) {
>>>>   %cmp = icmp eq i32 %x, 50
>>>>   br i1 %cmp, label %true, label %false
>>>>
>>>> true:                                             ; preds = %0
>>>> ; Has predicate info
>>>> ; branch predicate info { TrueEdge: 1 Comparison:  %cmp = icmp eq i32
>>>> %x, 50 }
>>>>   %x.0 = call i32 @llvm.predicateinfo.i32(i32 %x)
>>>>   ret i32 %x.0
>>>>
>>>> false:                                            ; preds = %0
>>>>   ret i32 1
>>>> }
>>>>
>>>> All uses that are dominated by the predicate info are renamed to use it.
>>>>
>>>> 2. We do so very quickly (it takes about 600ms to do 2 million blocks
>>>> with comparisons, by comparison, most passes simply crash or take forever
>>>> on the same file. As an example, GVN takes 500 seconds), and only insert
>>>> when and where the operands are used.
>>>>
>>>> 3. The intrinsics are marked so they do not affect optimization (and
>>>> currently, passes that use them all destroy them). They also can detect
>>>> when they've been moved to make sure they are still valid if need me.
>>>>
>>>> 4. They could be updated pretty easily if we wanted
>>>>
>>>> With one real downside:
>>>>
>>>> 5. We modify the IR in an analysis to do so :(
>>>>
>>>
>>> One question that occurred to me reading this, that probably is just my
>>> ignorance, but maybe you can help =]
>>>
>>> Any particular reason to not model this as essentially a shadow of the
>>> IR?
>>>
>>
>> Done already :) Checkout the newgvn-predicateinfo-uses branch on github
>> :P.
>>
>> The short version there's no point is because it's complex, and you can't
>> update it *anyway* at that point :P
>>
>>
>>
>>> I'm imagining creating a map from operand -> predinfo, where predinfo
>>> essentially models these inserted intrinsics, and rather than RAUW-ing
>>> dominated uses, you insert a mapping for dominated uses.
>>>
>>
>> Ah, if you could easily map from operand to predinfo, this might even
>> work okayish (you now have O(number of uses in the entire program) lookups
>> you've added to newgvn, to check if they have predicateinfo, but okay).
>>
>> But to start:
>> You can't do this easily
>> The operands you need to attach to  are Use's.
>>
>> Because
>>
>> if (a == 0)
>>  ret a
>> else
>>  ret a
>>
>> each a has different info.
>>
>> The predicateinfo insertion accomplishes this by physical renaming, so
>> the values have different name, and every use of a given predicateinfo has
>> the same value.
>>
>> But in a virtual form, it looks like this:
>>
>> IE this looks like:
>> if (a == 0)
>> 1 = PredicateDef(branch, blah blah blah)
>> PredicateUse(1), but really attached to *this use* of a, not to ret
>> ret a
>> else
>> 2 = PredicateDef(branch, blah blah blah)
>> PredicateUse(2), but really attached to *this use* of a, not to ret
>> ret a
>>
>>
>> (even if you really wanted to, you have to attach to uses because ách
>> operand of an instruction could be a use of different predicateinfo)
>>
>> You can in fact, attach to uses.
>> You can even do something like the above (IE have predicate use which
>> stores a regular use), so you know what's up, and have a real def/use form.
>>
>> it goes downhill from there, though, precisely because the info is
>> attached to Use's :(
>>
>> First, we have two types of things that would use this pass.
>>
>> Lazy things, and propagating things.
>>
>> Lazy things, that return a value for a thing, all at once, in a single
>> call (LVI if it had no caches) , might be convertible to use this.
>>
>> You have the problem that getting a use from an instruction operand is
>> not currently trivial (we'd have to have a getOperandUse that mirrors
>> getOperand or something), but not too bad.
>>
>> But all of ours really cache stuff, so they are really lazy propagating
>> things.
>>
>>
>> and Propagating things (CVP, SCCP, GVN, etc) , which are the rest of
>> them, are hard.
>>
>> They want to store tables of info by Value.
>>
>> But we need to store per-use info.
>>
>> If you modify them to store per-use info for all uses, you take a 5-15x
>> memory and time hit, to store info about each use, where most have the same
>> info.
>>
>> Any elimination you do also has to look up per-use info, not per-value
>> info like we do now, which means the elimination becomes (Defs/Uses) times
>> slower too :(
>>
>> This pretty much puts it out of the range of sanity.
>> But let's say you are smarter than the average bear. You want to build a
>> hybrid scheme, where you only store special use info for the things that
>> are special.
>> You realize you can make your virtual form, above, Value *'s.  Then you
>> have the name you need because the PredicateDef is a name you can use like
>> a normal Instruction def.
>>
>> But you still have to convert back and forth between Use's and Value's
>> everywhere, and plumb Use& through anything that does lookups, so you can
>> get a Use& and turn it into the right thing above.
>> Then, replacement of the original operands, to make it not have to do
>> lookups on every use in the program,  requires storing the original uses in
>> the PredicateUse class, so you know which uses in the original IR
>> correspond to which uses in the Predicate virtual IR.
>>
>> This is a *lot* of plumbing. and because Use's have an operator Value*,
>> the liklihood you create a lot of bugs is high, because if you use a use
>> where you meant use.getUser(), it still works :(
>>
>> You are creative though, and get this all working.
>>
>> That gets you value numbering (it's a lot harder in a lot of other
>> cases), without even needing an O(uses in the program) hash table.
>>
>> Now you have to try eliminate stuff.
>>
>> In everything but NewGVN, you have to rewrite the eliminators. They all
>> replace as they go, which means either double the replace cost (since you
>> have to replace the original uses, and for predicates, the original uses
>> and the uses in the predicate uses), or even more fun as you try to update
>> utilities to understand about looking up uses :(
>>
>>
>> In NewGVN, we eliminate in O(uses that have the same value) time, by
>> sorting congruence class instruction defs and uses by global and local
>> dominator numbers, and using a stack.
>> So we now need to order the predicate defs, uses, and regular defs and
>> uses all in the right order (and replace the right defs with predicate defs
>> so the lookup happens right) in dominator order to know which replaces
>> which.
>> But you can't use dominates() to do this, because it doesn't understand
>> these,etc.
>>
>> So you have to create a local instruction numbering, and figure out,
>> where in them each thing goes.
>> Which you can do, at some non-trivial cost over regular instruction
>> numbering
>>
>> This is just the simple parts, mind you.
>> After all this, you have something that takes pretty much double the
>>  memory, a lot harder to understand, *and* it still gets invalidated easily
>> :)
>>
>> But it's immutable!
>>
>> There's kind of no upside, and lots of downsides, both in building it,
>> and trying to use it:)
>>
>>
>>
>>> This would cause it to be a pure analysis again. It would make it
>>> relatively fragile of course -- it would be easily invalidated. But given
>>> how fast it is to compute, for several of the use cases it seems like it
>>> would be sufficient (NewGVN seems like it could tolerate computing this
>>> each time, etc.
>>>
>>> Anyways, I'm sure you've looked at something like this, and I'm curious
>>> what goes wrong with it.
>>>
>>>
>>
>>
>>>
>>>> The main user at the moment is NewGVN, which uses it to do the
>>>> equivalent of GVN's propagateEquality.  I'd be happy to make it a utility
>>>> called from NewGVN instead of an analysis. A number of folks online and off
>>>> asked me to make it usable for others, so i did that :) Some folks want to
>>>> use it to cleanup existing passes (EarlyCSE, etc), some want to use it in
>>>> new passes.
>>>>
>>>
>>> So, this definitely sounds useful, but I think that while it mutates the
>>> IR itself, I think it should just be a utility routine that passes call at
>>> their start to put the IR into the form they need. It'll be a bit tricky if
>>> you want analysis passes to observe this form -- you may not be able to use
>>> the caching infrastructure of the new PM at all for such analyses and may
>>> need to express them as stand alone utilities as well.
>>>
>>
>> Yeah, i'm just going to make it a utility
>>
>
I don't have much content to add to the discussion, but since I originally
asked you to bring the discussion to llvm-dev: making it a utility SGTM.

-- Sean Silva


>
>>>
>> If it is reasonable to do so, I'd also suggest a cleanup utility that
>>> ensures the mutated IR doesn't escape the end of the pass either to avoid
>>> having to teach the rest of the optimizer about it. But if there are
>>> reasons this doesn't work well, maybe we can use actual PHI nodes for this
>>> to simplify what we have to teach the rest of the optimizer about?
>>>
>>
>> We can use single argument phi nodes for branches, but it gets tricky
>> with assume, because you need to place the phi at the beginning of the
>> assume block, and you can't do this if the operands are defined in the same
>> block,  so you'd have to place  n ones with the same info in every
>> successor block, and hope the assume didn't do anything useful in the rest
>> of the assume block or something :(
>>
>> I'll also write a verifier.
>>
>>
>>> -Chandler
>>>
>>> PS: In case folks are wondering, the reason *why* I think this needs not
>>> be an analysis is because in the new PM we rely heavily on caching of
>>> analysis results. This caching behavior means that when things are run
>>> isn't very clear and we'd like to avoid ordering dependencies between two
>>> analyses running over the same IR to make things more debuggable later on.
>>> I can dive into more details here as useful of course.
>>>
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170203/bbd51c31/attachment-0001.html>


More information about the llvm-dev mailing list