[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Tue May 10 21:32:13 PDT 2016

*Vivek Pandya*

On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>
>
> ------------------------------
>
> *From: *"vivek pandya" <vivekvpandya at gmail.com>
> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Tim Amini Golling" <
> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at anl.gov>
> *Cc: *"Quentin Colombet" <qcolombet at apple.com>
> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
> *Subject: *[GSoC 2016] Interprocedural Register Allocation - Introduction
> and Feedback
>
> Hello LLVM Community,
>
> Sorry for delay as I was busy in final exams.
>
> I am Vivek from India. Thanks for choosing my proposal for Interprocedural
> Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal Finkel will be
> mentoring me for this project.
>
> IPRA can reduce code size and runtime of programs by allocating register
> across the module and procedure boundaries.
>
> I have identified some old but effective research work on this area.
> I want community's feedback for feasibility of these approach and I am
> targeting to implement two of them during this project.
>
> Here is list of the papers, I have read first two papers and I would like
> to discuss those approach first, I will read other two paper then initiate
> discussion for them as well. All I want is to find out a concrete
> implementation plan before 23 May, 2016 and for that I need community's
> help.
>
> 1) Compile time ----- Minimizing register usage penalty at procedure calls
> - http://dl.acm.org/citation.cfm?id=53999
> ====================================================================In
> this approach intra-procedural register allocation is used as base but
> machine code generation order is bottom up traversal of call graph and
> inter-procedural effect is achieved by propagating register usage
> information of callee function to caller (i.e child to parent in CallGraph)
> so that caller can use different registers than callee and can save load
> store cost at procedure call, this is not trivial as it seems due to
> recursive calls, library function usage etc. Also for upper region of the
> graph in this technique available number of registers might become zero in
> that case it should fall back to normal load store at procedure call. Apart
> from these difficulties other difficulties have been identified please
> follow this mail-chain
> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
> My mentor has already provided me a patch that alters code generation
> order as per bottom up call graph traversal, I am working from that point
> now. Any other help/suggestion is always welcomed.
>
> 2) Link time ----- Global register allocation at link time -
> http://dl.acm.org/citation.cfm?id=989415
> ====================================================================In
> this particular approach (sort of true IPRA) registers will be reallocated
> (this optimization will be optional if turned off still code will be
> compiled as per intra-procedural allocation) at link time. Here modules are
> first complied as per normal compilation but the object code is annotated
> with details so that linker can build call graph and also calculate usage
> information at link time. Compiler also write hints in object code that if
> particular variable is allocated in some other register ( due to new
> allocation) then how the code should be changed? Thus linker can use these
> information to decide which variables (global) need to be in same register
> through out the program execution and also according to register usage
> information in call graph which procedure will not be active simultaneously
> so that locals for that procedures can be in same registers with out load
> store at procedure calls.
> For these particular method help me to analyze feasibility:
> 1) Can llvm collects following information at module level in MachineIR?
> list of procedures in module, list of locals in procedures, list of
> procedures that a particular procedure can call, and a list of the
> variables this procedure references. Each entry in the last two lists
> includes an estimate of the number of times the procedure is called or the
> variable is referenced in each execution of this procedure
> 2) Can llvm write informative commands to object files?
> 3) Can LTO is capable of leveraging those commands?
>
> In terms of scoping the project for the summer, I definitely recommend
> that you focus on (1) first. If you finish that, we can certainly move on
> to other things.
>
>
> I'll add +1 here, but I already wrote the same thing on IRC when
> discussing with Vivek. True IPRA without a proper MachineModule
> infrastructure won't be doable in my opinion (even with such
> infrastructure, it may not be trivial in LLVM in general).
>
> Regarding link time, note that any such a design would likely look much
> different than in David Wall's paper however, because our LTO re-codegens
> everything anyway. The paper says, "Finally, it keeps us honest as
> designers of the system; once we postpone anything until link time, the
> temptation is great to postpone everything, ..." - Well, we've long-since
> succumb to that temptation when we LTO. C'est la vie.
>
>
> +1 as well, our LTO will benefit naturally from the leaf-to-root
> information propagation. ThinLTO will be more challenging/interesting
> though!
>
> For the first part a mechanism similar to MachineModulePass would be
> desirable but that may not be possible during this project, but if we can
> make some sort of smaller version of that to suit our purpose.
>
> I don't think we need to make any kind of MachineModulePass to make this
> work. Once we alter the visitation order based on the CGSCC iteration
> scheme, we can keep state in-between functions in the pre-existing hacky
> way (using static members of the relevant function passes).
>
>  Sorry my mistake here by first part I mean 1) requirement in the link
time approach.

>

> I also don't see where/why we need a MachineModule(Pass) for the CGSCC
> scheme, that said I'd rather avoid using a function pass with static
> members, if we can have a ModuleAnalysis that is bookkeeping the results
> for functions in the module and queries by the register allocator somehow.
> Matthias/Quentin may have other inputs on this aspect.
>
> Yes for propagating register usage approach we don't need MachineModulePass

Vivek

> --
> Mehdi
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160511/968e45d3/attachment.html>