[llvm-dev] Open Project : Inter-procedural Register Allocation [GSoC 2016]

Thu Mar 24 13:23:48 PDT 2016

*Vivek Pandya*

On Fri, Mar 25, 2016 at 1:30 AM, Pete Cooper <peter_cooper at apple.com> wrote:

>
> On Mar 24, 2016, at 11:09 AM, Pete Cooper <peter_cooper at apple.com> wrote:
>
>
> On Mar 23, 2016, at 6:38 PM, Gerolf Hoflehner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> On Mar 23, 2016, at 2:59 PM, Quentin Colombet via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> On Mar 23, 2016, at 2:44 PM, vivek pandya <vivekvpandya at gmail.com> wrote:
>
>
>
> *Vivek Pandya*
>
>
> On Wed, Mar 23, 2016 at 10:18 PM, Quentin Colombet <qcolombet at apple.com>
> wrote:
>
>> The pass manager already has support for calligraph connected region IIRC.
>>
> If I am not wrong Quentin and Mehdi Amini refers to CallGraphSCCPass.cpp
>
>
> Yes.
>
> As for the regmask part, we probably could hack something up in a week or
>> so, but I believe this is not what Vivek had in mind.
>>
>> Which operands should be kept in registers between function call should
> be justifying and for that we can take help from some research work ( some
> of I mentioned previously I have to read it again. Please suggest some more
> relevant papers  ) once that is implemented we can update the regmask for a
> call instruction to indicate which registers are free to be used. Am I
> going in correct direction ?
>
>
> I do not know if there is a paper on this as this is quite trivial, but
> IIRC Open64 register allocator does that.
> Anyhow, the algo is:
> Given a call graph SCC
> - Allocate the function with no calls or where each callee has been
> allocated
> - Propagate the clobbered registers to the callers of that function by
> updating the related regmasks on the callsites.
> Repeat until no more candidate.
>
> Right direction overall. The simplest approach to this is feasible within
> a summer and should definitely give you good results when you have cases of
> hot calls with many spill/fills around it that could be eliminated.
>
> One does not necessarily need the call graph. The compiler can do this as
> an opportunistic optimization. The callee collects a resource mask and the
> caller consumes it when it is “there”. Within a module when the
> callee”leaf” is compiled before the caller the information is “there”. When
> the call graph is available you want a bottom up walk for this
> optimization.
>
> A few things to keep an eye on:
> - The twist here could be that the bottom up order conflicts with the
> layout order, so the two optimizations would have to run independently. ( I
> have not looked into the layout algorithm so this might not be an actual
> issue here).
>
> Layout is just the order functions reach the AsmPrinter, so you’re right
> that this is going to make the function output different.  If we care about
> the order, which we may do, then we’d need to cache the data in the
> AsmPrinter and reorder it there somehow.
>
> Pete Cooper Do you mean to cache function order related data in AsmPrinter
?

>
> Some bonus features that come from codegen on the calligraphy, and
> specifically having accurate regmasks and similar information:
> - The X86 VZeroUpper pass should insert fewer VZeroUpper instructions
> before calls, and could possibly even learn that after the call the state
> of vzeroupper is known.
> - Values in registers can be used by the callee instead of loading them.
>
> The second one here is fun.  Imagine this pseudo code:
>
> foo:
> r0 = 1000
> …
> ret
>
> bar:
> …
> call foo
> vreg1 = vreg2 + 1000
>
>
> You know which registers contain which values after the call to foo.  In
> this case you know that the value of 1000 is available in a register
> already so you can avoid loading it for use in the add.  You could have
> other values in registers too, even those which are passed in to foo.  The
> ‘this’ pointer is the best example as its probably incredibly likely that
> r0 contains the this pointer after a function call which didn’t override r0
> for the return.
>
> The above mentioned case is interesting and useful, perhaps and simple
analysis pass which can return a map from value to register will help.

> The this pointer example is actually related to what Quentin mentioned as
> a future direction here: rewriting calling conventions.  If you have
>
> int A::foo() {
>   return this->value;
> }
>
>
> then you are going to have code something like
>
> foo:
> r0 = load r0, #offset_of_value
> ret
>
>
> If the this pointer is live after the call, and it almost certainly is,
> then it would be better to rewrite this call to avoid clobbering r0.  That
> is, return the this pointer in r0 and the value in r1.  That could actually
> be done as an IR level pass too though if its deemed profitable.
>
> Anyway, didn’t mean to distract from the immediate goals of this project.
> I’m excited to see the SCC code make it in tree and see what else it
> enables.
>
> One more, just for fun: Inter-procedural stack allocation.  That is of
> calls bar, bar needs 4 bytes of stack space.  Instead of bar allocating 4
> bytes, it adds an attribute to itself, then foo allocates 4 bytes of space
> at the bottom of the stack for bar to use.
>
> Can you please provide some links to understand benefits of IP stack
allocation ?

I have also write the draft proposal, I will share it through the summer of
code site.
Here is the link
https://docs.google.com/document/d/1DrsaFJdtxV73Zpns2bEgjATLFcWuaYMPHuvt5THLeLk/edit?usp=sharing
This is not much effective but still I would like to give it a try. Please
review it quickly I have 23 hours to submit the final PDF.

Thanks !
Vivek

> Pete
>
>
> Cheers,
> Pete
>
> - You also need to consider the supported preemption model. When a
> function can be preempted dynamically the statically collected information
> for a callee cannot be used and the optimization may not kick in.
> - Most of the work I would expect to be tuning the assignment heuristics
> in the allocator (a live range that spans two calls sites, should it go
> into a scratch register that is not used in one call but in the other? How
> could profile change that? etc). But again, perhaps the cheapest approach
> is not to go into the heuristics and only remove a scratch register
> fill/spill around a call sit when that register is not destroyed anywhere
> down in the call tree.
>
> Allocate remaining functions “normally”.
>
>
> I think the main challenge of a real inter-procedural register allocator
>> is to change all of the calling convention dynamically and more importantly
>> convey the right information to other tools (via CFA, CFI, etc.).
>>
>> Here for calling convention do you mean that has to be handle for
> different kind of backends differently  or you are referring some thing I
> don't know. I don't understand what do you mean by 'convey the right
> information to other tool' if we have updated regmask for a call
> instruction then MachineFunction should be able to reflect that fact in
> MachineFunction pass which is used for intra-procedural register
> allocation, all we have done is allocated some registers that should live
> across the function call.
>
>
> My mistake, I though you had in mind what I call a “true” inter procedural
> registers allocator: one that changes the allocation at function boundaries
> as well. I.e., it may choose that it is more efficient to put the first
> argument of function foo is register FP0 even if the ABI says R0.
> With this kind of scheme, you break the ABI (and you need LTO to be
> allowed to do that), you need to “dynamically” adjust the calling
> convention to what the register allocator chooses, and moreover you need to
> be able to communicate to the other tools (dynamic linker, debugger, etc.)
> where are the things that are usually defined by the ABI, like the frame
> pointer, the return value, etc.
>
> Cheers,
> -Quentin
>
>
> Sincerely,
> Vivek
>
> Cheers,
>> Q.
>> > On Mar 22, 2016, at 6:04 PM, Matthias Braun <mbraun at apple.com> wrote:
>> >
>> > No need to apologize this thread surely deserved some answers :)
>> >
>> > From my perspective this project sounds doable. I would expect the
>> register allocation parts to be not too hard: I imagine this being just
>> distilling a new clobber regmask after allocating a function. I would
>> expect the challenging (or annoying) part to get a machine module pass (or
>> a similar mechanism to influence the order in which functions are
>> processed) and a callgraph in the backend. So this might end up being more
>> pass manager / infrastructure work than register allocation.
>> >
>> > I'd be happy to answer detail questions or give guidance on the
>> register allocation aspects.
>> >
>> > - Matthias
>> >
>> >> On Mar 22, 2016, at 5:27 PM, Sanjoy Das <
>> sanjoy at playingwithpointers.com> wrote:
>> >>
>> >> Apologies: didn't notice how old this thread is before replying.
>> >>
>> >> On Tue, Mar 22, 2016 at 5:24 PM, Sanjoy Das
>> >> <sanjoy at playingwithpointers.com> wrote:
>> >>> Hi Vivek,
>> >>>
>> >>> [+CC Matthias, Quentin]
>> >>>
>> >>> Inter-procedural register allocation can be a big win, but my estimate
>> >>> is that it will be challenging to complete within one summer unless
>> >>> you're already familiar with LLVM's register allocator.
>> >>>
>> >>> I've CC'ed some people who can give you some more detailed
>> information.
>> >>>
>> >>> -- Sanjoy
>> >>>
>> >>>
>> >>> On Tue, Feb 9, 2016 at 9:17 PM, vivek pandya via llvm-dev
>> >>> <llvm-dev at lists.llvm.org> wrote:
>> >>>> Hello Community,
>> >>>>
>> >>>> I would like to know status of the project and also importance of
>> it. If the
>> >>>> project is still open I would like to work on GSoC 2016 proposal for
>> >>>> Inter-procedural Register Allocation, in that case please also
>> suggest
>> >>>> possible mentor or let me know if anyone is willing to be mentor for
>> this.
>> >>>>
>> >>>> Sincerely,
>> >>>> Vivek Pandya
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> LLVM Developers mailing list
>> >>>> llvm-dev at lists.llvm.org
>> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Sanjoy Das
>> >>> http://playingwithpointers.com
>> >>
>> >>
>> >>
>> >> --
>> >> Sanjoy Das
>> >> http://playingwithpointers.com
>> >
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/cc061fe0/attachment.html>