[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Wed May 18 10:25:14 PDT 2016

*Vivek Pandya*

On Wed, May 11, 2016 at 11:19 PM, Matthias Braun <matze at braunis.de> wrote:

> Yes there is also MachineRegisterInfo::UsedPhysRegMask which should be the
> union of all regmasks in the function.
>
>

> ./lib/CodeGen/MIRParser/MIRParser.cpp:  RegInfo.*setUsedPhysRegMask*
> (CalleeSavedRegisterMask.flip());
>
> Is this line responsible for setting the UsedPhysMask after codegen for a
function? And This will be changed for each function call right ?
-Vivek

> On May 11, 2016, at 10:47 AM, Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> ------------------------------
>
> *From: *"Matthias Braun" <matze at braunis.de>
>
> *To: *"Hal Finkel" <hfinkel at anl.gov>
> *Cc: *"vivek pandya" <vivekvpandya at gmail.com>, "llvm-dev" <
> llvm-dev at lists.llvm.org>
> *Sent: *Wednesday, May 11, 2016 12:46:25 PM
> *Subject: *Re: [llvm-dev] [GSoC 2016] Interprocedural Register Allocation
> - Introduction and Feedback
>
>
> On May 11, 2016, at 3:31 AM, Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> ------------------------------
>
> *From: *"vivek pandya" <vivekvpandya at gmail.com>
> *To: *"Mehdi Amini" <mehdi.amini at apple.com>
> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "Quentin Colombet" <
> qcolombet at apple.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias
> Braun" <matze at braunis.de>
> *Sent: *Wednesday, May 11, 2016 3:15:03 AM
> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
> Introduction and Feedback
>
>
>
> *Vivek Pandya*
>
>
> On Wed, May 11, 2016 at 10:02 AM, vivek pandya <vivekvpandya at gmail.com>
> wrote:
>
>>
>>
>> *Vivek Pandya*
>>
>>
>> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com>
>> wrote:
>>
>>>
>>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>>> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Tim Amini Golling" <
>>> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at anl.gov>
>>> *Cc: *"Quentin Colombet" <qcolombet at apple.com>
>>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
>>> *Subject: *[GSoC 2016] Interprocedural Register Allocation -
>>> Introduction and Feedback
>>>
>>> Hello LLVM Community,
>>>
>>> Sorry for delay as I was busy in final exams.
>>>
>>> I am Vivek from India. Thanks for choosing my proposal for
>>> Interprocedural Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal
>>> Finkel will be mentoring me for this project.
>>>
>>> IPRA can reduce code size and runtime of programs by allocating register
>>> across the module and procedure boundaries.
>>>
>>> I have identified some old but effective research work on this area.
>>> I want community's feedback for feasibility of these approach and I am
>>> targeting to implement two of them during this project.
>>>
>>> Here is list of the papers, I have read first two papers and I would
>>> like to discuss those approach first, I will read other two paper then
>>> initiate discussion for them as well. All I want is to find out a concrete
>>> implementation plan before 23 May, 2016 and for that I need community's
>>> help.
>>>
>>> 1) Compile time ----- Minimizing register usage penalty at procedure
>>> calls - http://dl.acm.org/citation.cfm?id=53999
>>> ====================================================================In
>>> this approach intra-procedural register allocation is used as base but
>>> machine code generation order is bottom up traversal of call graph and
>>> inter-procedural effect is achieved by propagating register usage
>>> information of callee function to caller (i.e child to parent in CallGraph)
>>> so that caller can use different registers than callee and can save load
>>> store cost at procedure call, this is not trivial as it seems due to
>>> recursive calls, library function usage etc. Also for upper region of the
>>> graph in this technique available number of registers might become zero in
>>> that case it should fall back to normal load store at procedure call. Apart
>>> from these difficulties other difficulties have been identified please
>>> follow this mail-chain
>>> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>>> My mentor has already provided me a patch that alters code generation
>>> order as per bottom up call graph traversal, I am working from that point
>>> now. Any other help/suggestion is always welcomed.
>>>
>>> 2) Link time ----- Global register allocation at link time -
>>> http://dl.acm.org/citation.cfm?id=989415
>>> ====================================================================In
>>> this particular approach (sort of true IPRA) registers will be reallocated
>>> (this optimization will be optional if turned off still code will be
>>> compiled as per intra-procedural allocation) at link time. Here modules are
>>> first complied as per normal compilation but the object code is annotated
>>> with details so that linker can build call graph and also calculate usage
>>> information at link time. Compiler also write hints in object code that if
>>> particular variable is allocated in some other register ( due to new
>>> allocation) then how the code should be changed? Thus linker can use these
>>> information to decide which variables (global) need to be in same register
>>> through out the program execution and also according to register usage
>>> information in call graph which procedure will not be active simultaneously
>>> so that locals for that procedures can be in same registers with out load
>>> store at procedure calls.
>>> For these particular method help me to analyze feasibility:
>>> 1) Can llvm collects following information at module level in MachineIR?
>>> list of procedures in module, list of locals in procedures, list of
>>> procedures that a particular procedure can call, and a list of the
>>> variables this procedure references. Each entry in the last two lists
>>> includes an estimate of the number of times the procedure is called or the
>>> variable is referenced in each execution of this procedure
>>> 2) Can llvm write informative commands to object files?
>>> 3) Can LTO is capable of leveraging those commands?
>>>
>>> In terms of scoping the project for the summer, I definitely recommend
>>> that you focus on (1) first. If you finish that, we can certainly move on
>>> to other things.
>>>
>>>
>>> I'll add +1 here, but I already wrote the same thing on IRC when
>>> discussing with Vivek. True IPRA without a proper MachineModule
>>> infrastructure won't be doable in my opinion (even with such
>>> infrastructure, it may not be trivial in LLVM in general).
>>>
>>> Regarding link time, note that any such a design would likely look much
>>> different than in David Wall's paper however, because our LTO re-codegens
>>> everything anyway. The paper says, "Finally, it keeps us honest as
>>> designers of the system; once we postpone anything until link time, the
>>> temptation is great to postpone everything, ..." - Well, we've long-since
>>> succumb to that temptation when we LTO. C'est la vie.
>>>
>>>
>>> +1 as well, our LTO will benefit naturally from the leaf-to-root
>>> information propagation. ThinLTO will be more challenging/interesting
>>> though!
>>>
>>> For the first part a mechanism similar to MachineModulePass would be
>>> desirable but that may not be possible during this project, but if we can
>>> make some sort of smaller version of that to suit our purpose.
>>>
>>> I don't think we need to make any kind of MachineModulePass to make this
>>> work. Once we alter the visitation order based on the CGSCC iteration
>>> scheme, we can keep state in-between functions in the pre-existing hacky
>>> way (using static members of the relevant function passes).
>>>
>>>  Sorry my mistake here by first part I mean 1) requirement in the link
>> time approach.
>>
>>>
>>
>>> I also don't see where/why we need a MachineModule(Pass) for the CGSCC
>>> scheme, that said I'd rather avoid using a function pass with static
>>> members, if we can have a ModuleAnalysis that is bookkeeping the results
>>> for functions in the module and queries by the register allocator somehow.
>>> Matthias/Quentin may have other inputs on this aspect.
>>>
>>
> @Hal do you mean to add a simple MachineFunction pass that will just
> operate on register allocated function and prepare a BitVector to indicate
> which register is being used by MachineFunction, and then use this pass as
> analysis pass (i.e just simply return static BitVector for clobbered
> register when register allocation for next function begins. This part is
> not much clear to me) this thing can be done by scheduling a pass post
> register allocation in lib/CodeGen/Passes.cpp
>
> void TargetPassConfig::addMachinePasses() {
> .
> .
> .
>   // Run pre-ra passes.
>   addPreRegAlloc();
>
>   // Run register allocation and passes that are tightly coupled with it,
>   // including phi elimination and scheduling.
>   if (getOptimizeRegAlloc())
>     addOptimizedRegAlloc(createRegAllocPass(true));
>   else
>     addFastRegAlloc(createRegAllocPass(false));
>
>   // Run post-ra passes.
>   addPostRegAlloc();
> // Adding a new pass here which keeps register mask information across
> function calls.
> .
> .
> .
> }
>
> But this also requires current register allocators to use this information
> in someway because RegMaskBits in LiveIntervalAnalysis.cpp is not static
> across calls. I mean I am not clear for how to propagate static info to
> Intra-procedural Register allocators (if possible without disturbing their
> code )
>
> First, my hope is that we won't need to change the register allocators, as
> such, in order to make use of this information. Instead, we'll simply be
> able to alter the register masks generated for the call instructions. These
> masks will indicate fewer clobbers than might otherwise be present based on
> the ABI because of information gathered during the codegen of the callee.
> These masks are generally constructed by target based on the calling
> convention. The PowerPC backend, for example, looks like this:
>
>   // Add a register mask operand representing the call-preserved registers.
>   const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo();
>   const uint32_t *Mask =
>       TRI->getCallPreservedMask(DAG.getMachineFunction(), CallConv);
>   assert(Mask && "Missing call preserved mask for calling convention");
>   Ops.push_back(DAG.getRegisterMask(Mask));
>
> but it can be more complicated. If you look for uses of 'getRegisterMask'
> in Target/*/*ISelLowering.cpp, you'll see what I mean. Regardless, the code
> ends up calling some method is the targets TargetRegisterInfo subclass.
> These methods generally look something like this:
>
> const uint32_t *
> PPCRegisterInfo::getCallPreservedMask(const MachineFunction &MF,
>                                       CallingConv::ID CC) const {
>   const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();
>   ...
>   return TM.isPPC64() ? (Subtarget.hasAltivec() ?
> CSR_SVR464_Altivec_RegMask
>                                                 : CSR_SVR464_RegMask)
>                       : (Subtarget.hasAltivec() ?
> CSR_SVR432_Altivec_RegMask
>                                                 : CSR_SVR432_RegMask);
> }
>
> In any case, the fundamental idea here is that, when someone calls
> getCallPreservedMask in order to set the regmask on a call, we might not
> have to use the CC at all. Instead, if we've already codegened the
> function, we might use a cache of 'exact' register masks computed during
> codegen of the potential callees instead.
>
> In order to do this, I think we'll need to provide a function callable
> from the target's getCallPreservedMask implementation, which can return
> such an 'exact' regmask when available. I think we need to do it this way
> for two reasons:
>
>  1. Not all of the target code calls getCallPreservedMask, but sometimes
> calls other similar target-specific functions (e.g.
> getTLSCallPreservedMask).
>  2. The targets need to opt-in to this behavior because only the target
> can know that all register uses are really tagged correctly post "pre-emit".
>
> Because the target is free to introduce uses of registers at essentially
> any time, we need to do the scanning for used registers after the
> "pre-emit" passes run. This can be done by scheduling some simple
> register-use scanning pass after the call to addPreEmitPass in
> lib/CodeGen/Passes.cpp.
>
> MachineRegister maintains linked lists with defs/uses for each register so
> you can determine whether a specific register is used or not without
> scanning.
>
> Does this include regmask-clobbered registers?
>
>  -Hal
>
>
> - Matthias
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160518/50237d0a/attachment-0001.html>