[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Sun May 15 21:11:08 PDT 2016

*Vivek Pandya*

On Sun, May 15, 2016 at 11:51 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On May 15, 2016, at 12:43 AM, vivek pandya <vivekvpandya at gmail.com> wrote:
>
>
>
> *Vivek Pandya*
>
>
> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com>
> wrote:
>
>>
>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>>
>>
>> ------------------------------
>>
>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Tim Amini Golling" <
>> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at anl.gov>
>> *Cc: *"Quentin Colombet" <qcolombet at apple.com>
>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
>> *Subject: *[GSoC 2016] Interprocedural Register Allocation -
>> Introduction and Feedback
>>
>> Hello LLVM Community,
>>
>> Sorry for delay as I was busy in final exams.
>>
>> I am Vivek from India. Thanks for choosing my proposal for
>> Interprocedural Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal
>> Finkel will be mentoring me for this project.
>>
>> IPRA can reduce code size and runtime of programs by allocating register
>> across the module and procedure boundaries.
>>
>> I have identified some old but effective research work on this area.
>> I want community's feedback for feasibility of these approach and I am
>> targeting to implement two of them during this project.
>>
>> Here is list of the papers, I have read first two papers and I would like
>> to discuss those approach first, I will read other two paper then initiate
>> discussion for them as well. All I want is to find out a concrete
>> implementation plan before 23 May, 2016 and for that I need community's
>> help.
>>
>> 1) Compile time ----- Minimizing register usage penalty at procedure
>> calls - http://dl.acm.org/citation.cfm?id=53999
>> ====================================================================In
>> this approach intra-procedural register allocation is used as base but
>> machine code generation order is bottom up traversal of call graph and
>> inter-procedural effect is achieved by propagating register usage
>> information of callee function to caller (i.e child to parent in CallGraph)
>> so that caller can use different registers than callee and can save load
>> store cost at procedure call, this is not trivial as it seems due to
>> recursive calls, library function usage etc. Also for upper region of the
>> graph in this technique available number of registers might become zero in
>> that case it should fall back to normal load store at procedure call. Apart
>> from these difficulties other difficulties have been identified please
>> follow this mail-chain
>> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>> My mentor has already provided me a patch that alters code generation
>> order as per bottom up call graph traversal, I am working from that point
>> now. Any other help/suggestion is always welcomed.
>>
>> 2) Link time ----- Global register allocation at link time -
>> http://dl.acm.org/citation.cfm?id=989415
>> ====================================================================In
>> this particular approach (sort of true IPRA) registers will be reallocated
>> (this optimization will be optional if turned off still code will be
>> compiled as per intra-procedural allocation) at link time. Here modules are
>> first complied as per normal compilation but the object code is annotated
>> with details so that linker can build call graph and also calculate usage
>> information at link time. Compiler also write hints in object code that if
>> particular variable is allocated in some other register ( due to new
>> allocation) then how the code should be changed? Thus linker can use these
>> information to decide which variables (global) need to be in same register
>> through out the program execution and also according to register usage
>> information in call graph which procedure will not be active simultaneously
>> so that locals for that procedures can be in same registers with out load
>> store at procedure calls.
>> For these particular method help me to analyze feasibility:
>> 1) Can llvm collects following information at module level in MachineIR?
>> list of procedures in module, list of locals in procedures, list of
>> procedures that a particular procedure can call, and a list of the
>> variables this procedure references. Each entry in the last two lists
>> includes an estimate of the number of times the procedure is called or the
>> variable is referenced in each execution of this procedure
>> 2) Can llvm write informative commands to object files?
>> 3) Can LTO is capable of leveraging those commands?
>>
>> In terms of scoping the project for the summer, I definitely recommend
>> that you focus on (1) first. If you finish that, we can certainly move on
>> to other things.
>>
>>
>> I'll add +1 here, but I already wrote the same thing on IRC when
>> discussing with Vivek. True IPRA without a proper MachineModule
>> infrastructure won't be doable in my opinion (even with such
>> infrastructure, it may not be trivial in LLVM in general).
>>
>> Regarding link time, note that any such a design would likely look much
>> different than in David Wall's paper however, because our LTO re-codegens
>> everything anyway. The paper says, "Finally, it keeps us honest as
>> designers of the system; once we postpone anything until link time, the
>> temptation is great to postpone everything, ..." - Well, we've long-since
>> succumb to that temptation when we LTO. C'est la vie.
>>
>>
>> +1 as well, our LTO will benefit naturally from the leaf-to-root
>> information propagation. ThinLTO will be more challenging/interesting
>> though!
>>
> @Mehdi I don't understand this comment, it seems to me that two different
> approaches has been merged here. Information propagation in calligraph will
> be done without LTO.
>
>
> You are propagating information on the callgraph, but this is efficient
> only when you have the function definition that is codegen in the same
> module as the call site.
> So why LTO?
> Because you see "all" the definition for "all" the call sites in your
> program.
>
@Hal, @Mehdi yes LTO gives more information compare to other optimizations
but as per our current plan CallGraphSCC pass and a MachineFunctionPass for
register usage scanning will be sufficient right?

-Vivek

>
> --
> Mehdi
>
>
>
>
> For the first part a mechanism similar to MachineModulePass would be
>> desirable but that may not be possible during this project, but if we can
>> make some sort of smaller version of that to suit our purpose.
>>
>> I don't think we need to make any kind of MachineModulePass to make this
>> work. Once we alter the visitation order based on the CGSCC iteration
>> scheme, we can keep state in-between functions in the pre-existing hacky
>> way (using static members of the relevant function passes).
>>
>>
>> I also don't see where/why we need a MachineModule(Pass) for the CGSCC
>> scheme, that said I'd rather avoid using a function pass with static
>> members, if we can have a ModuleAnalysis that is bookkeeping the results
>> for functions in the module and queries by the register allocator somehow.
>> Matthias/Quentin may have other inputs on this aspect.
>>
>> --
>> Mehdi
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160516/31a71c8a/attachment.html>