[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Tue May 24 20:16:05 PDT 2016

On Wed, May 25, 2016 at 8:44 AM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> ------------------------------
>
> *From: *"vivek pandya" <vivekvpandya at gmail.com>
> *To: *"Hal Finkel" <hfinkel at anl.gov>
> *Cc: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias Braun" <
> matze at braunis.de>, "Mehdi Amini" <mehdi.amini at apple.com>, "Quentin
> Colombet" <qcolombet at apple.com>
> *Sent: *Tuesday, May 24, 2016 9:34:29 PM
>
> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
> Introduction and Feedback
>
>
>
> On Wed, May 25, 2016 at 3:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>>
>> ------------------------------
>>
>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>> *To: *"Quentin Colombet" <qcolombet at apple.com>
>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org>,
>> "Matthias Braun" <matze at braunis.de>, "Mehdi Amini" <mehdi.amini at apple.com
>> >
>> *Sent: *Tuesday, May 24, 2016 1:00:58 PM
>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
>> Introduction and Feedback
>>
>> Hello,
>>
>> I have written following code to check each register if it is used by
>> machineFunction or not :
>>
>> MachineRegisterInfo *MRI = &MF.getRegInfo();
>> TargetRegisterInfo *TRI = (TargetRegisterInfo
>> *)MF.getSubtarget().getRegisterInfo();
>>
>> Some reason you can't use a const pointer here?
>>
> MCRegisterInfo is just used to get conventional name of register for given
> target like AX, BX on X86.
>
>>
>> const TargetMachine &TM = MF.getTarget();
>> const MCRegisterInfo *MCRI = TM.getMCRegisterInfo();
>> DEBUG(dbgs() << "Function Name : " << MF.getName() << "\n");
>>
>> for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e
>> = (*TRI).regclass_end(); i != e; i++ ) {
>> for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege =
>> (*i)->end(); pregi != prege; pregi++ ) {
>> DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) << " is
>> modified "<< MRI->isPhysRegModified(*pregi) << " \n");
>>
>> Try isPhysRegUsed.
>>
> ok
>
>>
>> }
>> }
>> DEBUG(dbgs() << "\n");
>>
>> The pass which is executing this code is schedule POST-RA stage but this
>> gives me true for all registers i.e in each function all registers are
>> being used except EBP and some other similar, Is this a correct way to get
>> register usage information ? I think I have made some mistake please help.
>>
>>
>> You might look at the implementation of these functions in
>> lib/CodeGen/MachineRegisterInfo.cpp and figure out if they're returning
>> true because UsedPhysRegMask.test(PhysReg) is true or because
>> reg_nodbg_empty(*AliasReg) is true.
>>
> Yes that helped now I am getting actual register which have been used by
> given function, but a little problem
> The updated code is as shown below :
> for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e =
> (*TRI).regclass_end(); i != e; i++ ) {
> for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege =
> (*i)->end(); pregi != prege; pregi++ ) {
> for (MCRegAliasIterator AliasReg(*pregi, TRI, true); AliasReg.isValid();
> ++AliasReg) {
>    if (!MRI->reg_nodbg_empty(*AliasReg)) {
>     DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) << "
> is used "<< MRI->isPhysRegUsed(*pregi) << " \n");
>     break; // no need to process more alias
>    }
>   }
> }
> }
> But here some registers are getting processed with in different classes
> (unnecessary processing) Is this only way to iterate through all used
> register (using RegClass iterator) ? Is there any way to avoid duplicate
> regs?
> Of course currently I am just printing but next I am thinking to use a map
> to track usage info , in that only distinct register info will be stored
> but still due to loop structure I need to iterate through a single register
> 3 - 4 times making it time consuming.
>
> Yes, I believe you can just do:
>
>   for (unsigned Reg = 0; Reg < TRI->getNumRegs(); ++Reg) {
>
Oh yes thanks I just forgot that PhyReg starts at 0.

>
>
>  -Hal
>
>
> -Vivek
>
>>
>>
>>  -Hal
>>
>>
>>
>> Vivek
>>
>> On Wed, May 18, 2016 at 11:42 PM, Quentin Colombet <qcolombet at apple.com>
>> wrote:
>>
>>>
>>> On May 18, 2016, at 11:00 AM, vivek pandya <vivekvpandya at gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> *Vivek Pandya*
>>>
>>>
>>> On Wed, May 18, 2016 at 11:25 PM, Quentin Colombet <qcolombet at apple.com>
>>>  wrote:
>>>
>>>>
>>>> On May 18, 2016, at 10:46 AM, vivek pandya <vivekvpandya at gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> *Vivek Pandya*
>>>>
>>>>
>>>> On Wed, May 11, 2016 at 4:01 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>>>>> *To: *"Mehdi Amini" <mehdi.amini at apple.com>
>>>>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "Quentin Colombet" <
>>>>> qcolombet at apple.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias
>>>>> Braun" <matze at braunis.de>
>>>>> *Sent: *Wednesday, May 11, 2016 3:15:03 AM
>>>>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
>>>>> Introduction and Feedback
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Vivek Pandya*
>>>>>
>>>>>
>>>>> On Wed, May 11, 2016 at 10:02 AM, vivek pandya <vivekvpandya at gmail.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> *Vivek Pandya*
>>>>>>
>>>>>>
>>>>>> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>>>>>>> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Tim Amini Golling" <
>>>>>>> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at anl.gov>
>>>>>>> *Cc: *"Quentin Colombet" <qcolombet at apple.com>
>>>>>>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
>>>>>>> *Subject: *[GSoC 2016] Interprocedural Register Allocation -
>>>>>>> Introduction and Feedback
>>>>>>>
>>>>>>> Hello LLVM Community,
>>>>>>>
>>>>>>> Sorry for delay as I was busy in final exams.
>>>>>>>
>>>>>>> I am Vivek from India. Thanks for choosing my proposal for
>>>>>>> Interprocedural Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal
>>>>>>> Finkel will be mentoring me for this project.
>>>>>>>
>>>>>>> IPRA can reduce code size and runtime of programs by allocating
>>>>>>> register across the module and procedure boundaries.
>>>>>>>
>>>>>>> I have identified some old but effective research work on this area.
>>>>>>> I want community's feedback for feasibility of these approach and I
>>>>>>> am targeting to implement two of them during this project.
>>>>>>>
>>>>>>> Here is list of the papers, I have read first two papers and I would
>>>>>>> like to discuss those approach first, I will read other two paper then
>>>>>>> initiate discussion for them as well. All I want is to find out a concrete
>>>>>>> implementation plan before 23 May, 2016 and for that I need community's
>>>>>>> help.
>>>>>>>
>>>>>>> 1) Compile time ----- Minimizing register usage penalty at procedure
>>>>>>> calls - http://dl.acm.org/citation.cfm?id=53999
>>>>>>> ====================================================================In
>>>>>>> this approach intra-procedural register allocation is used as base but
>>>>>>> machine code generation order is bottom up traversal of call graph and
>>>>>>> inter-procedural effect is achieved by propagating register usage
>>>>>>> information of callee function to caller (i.e child to parent in CallGraph)
>>>>>>> so that caller can use different registers than callee and can save load
>>>>>>> store cost at procedure call, this is not trivial as it seems due to
>>>>>>> recursive calls, library function usage etc. Also for upper region of the
>>>>>>> graph in this technique available number of registers might become zero in
>>>>>>> that case it should fall back to normal load store at procedure call. Apart
>>>>>>> from these difficulties other difficulties have been identified please
>>>>>>> follow this mail-chain
>>>>>>> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>>>>>>> My mentor has already provided me a patch that alters code
>>>>>>> generation order as per bottom up call graph traversal, I am working from
>>>>>>> that point now. Any other help/suggestion is always welcomed.
>>>>>>>
>>>>>>> 2) Link time ----- Global register allocation at link time -
>>>>>>> http://dl.acm.org/citation.cfm?id=989415
>>>>>>> ====================================================================In
>>>>>>> this particular approach (sort of true IPRA) registers will be reallocated
>>>>>>> (this optimization will be optional if turned off still code will be
>>>>>>> compiled as per intra-procedural allocation) at link time. Here modules are
>>>>>>> first complied as per normal compilation but the object code is annotated
>>>>>>> with details so that linker can build call graph and also calculate usage
>>>>>>> information at link time. Compiler also write hints in object code that if
>>>>>>> particular variable is allocated in some other register ( due to new
>>>>>>> allocation) then how the code should be changed? Thus linker can use these
>>>>>>> information to decide which variables (global) need to be in same register
>>>>>>> through out the program execution and also according to register usage
>>>>>>> information in call graph which procedure will not be active simultaneously
>>>>>>> so that locals for that procedures can be in same registers with out load
>>>>>>> store at procedure calls.
>>>>>>> For these particular method help me to analyze feasibility:
>>>>>>> 1) Can llvm collects following information at module level in
>>>>>>> MachineIR? list of procedures in module, list of locals in procedures, list
>>>>>>> of procedures that a particular procedure can call, and a list of the
>>>>>>> variables this procedure references. Each entry in the last two lists
>>>>>>> includes an estimate of the number of times the procedure is called or the
>>>>>>> variable is referenced in each execution of this procedure
>>>>>>> 2) Can llvm write informative commands to object files?
>>>>>>> 3) Can LTO is capable of leveraging those commands?
>>>>>>>
>>>>>>> In terms of scoping the project for the summer, I definitely
>>>>>>> recommend that you focus on (1) first. If you finish that, we can certainly
>>>>>>> move on to other things.
>>>>>>>
>>>>>>>
>>>>>>> I'll add +1 here, but I already wrote the same thing on IRC when
>>>>>>> discussing with Vivek. True IPRA without a proper MachineModule
>>>>>>> infrastructure won't be doable in my opinion (even with such
>>>>>>> infrastructure, it may not be trivial in LLVM in general).
>>>>>>>
>>>>>>> Regarding link time, note that any such a design would likely look
>>>>>>> much different than in David Wall's paper however, because our LTO
>>>>>>> re-codegens everything anyway. The paper says, "Finally, it keeps us honest
>>>>>>> as designers of the system; once we postpone anything until link time, the
>>>>>>> temptation is great to postpone everything, ..." - Well, we've long-since
>>>>>>> succumb to that temptation when we LTO. C'est la vie.
>>>>>>>
>>>>>>>
>>>>>>> +1 as well, our LTO will benefit naturally from the leaf-to-root
>>>>>>> information propagation. ThinLTO will be more challenging/interesting
>>>>>>> though!
>>>>>>>
>>>>>>> For the first part a mechanism similar to MachineModulePass would be
>>>>>>> desirable but that may not be possible during this project, but if we can
>>>>>>> make some sort of smaller version of that to suit our purpose.
>>>>>>>
>>>>>>> I don't think we need to make any kind of MachineModulePass to make
>>>>>>> this work. Once we alter the visitation order based on the CGSCC iteration
>>>>>>> scheme, we can keep state in-between functions in the pre-existing hacky
>>>>>>> way (using static members of the relevant function passes).
>>>>>>>
>>>>>>>  Sorry my mistake here by first part I mean 1) requirement in the
>>>>>> link time approach.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> I also don't see where/why we need a MachineModule(Pass) for the
>>>>>>> CGSCC scheme, that said I'd rather avoid using a function pass with static
>>>>>>> members, if we can have a ModuleAnalysis that is bookkeeping the results
>>>>>>> for functions in the module and queries by the register allocator somehow.
>>>>>>> Matthias/Quentin may have other inputs on this aspect.
>>>>>>>
>>>>>>
>>>>> @Hal do you mean to add a simple MachineFunction pass that will just
>>>>> operate on register allocated function and prepare a BitVector to indicate
>>>>> which register is being used by MachineFunction, and then use this pass as
>>>>> analysis pass (i.e just simply return static BitVector for clobbered
>>>>> register when register allocation for next function begins. This part is
>>>>> not much clear to me) this thing can be done by scheduling a pass post
>>>>> register allocation in lib/CodeGen/Passes.cpp
>>>>>
>>>>> void TargetPassConfig::addMachinePasses() {
>>>>> .
>>>>> .
>>>>> .
>>>>>   // Run pre-ra passes.
>>>>>   addPreRegAlloc();
>>>>>
>>>>>   // Run register allocation and passes that are tightly coupled with
>>>>> it,
>>>>>   // including phi elimination and scheduling.
>>>>>   if (getOptimizeRegAlloc())
>>>>>     addOptimizedRegAlloc(createRegAllocPass(true));
>>>>>   else
>>>>>     addFastRegAlloc(createRegAllocPass(false));
>>>>>
>>>>>   // Run post-ra passes.
>>>>>   addPostRegAlloc();
>>>>> // Adding a new pass here which keeps register mask information across
>>>>> function calls.
>>>>> .
>>>>> .
>>>>> .
>>>>> }
>>>>>
>>>>> But this also requires current register allocators to use this
>>>>> information in someway because RegMaskBits in LiveIntervalAnalysis.cpp is
>>>>> not static across calls. I mean I am not clear for how to propagate static
>>>>> info to Intra-procedural Register allocators (if possible without
>>>>> disturbing their code )
>>>>>
>>>>> First, my hope is that we won't need to change the register
>>>>> allocators, as such, in order to make use of this information. Instead,
>>>>> we'll simply be able to alter the register masks generated for the call
>>>>> instructions. These masks will indicate fewer clobbers than might otherwise
>>>>> be present based on the ABI because of information gathered during the
>>>>> codegen of the callee. These masks are generally constructed by target
>>>>> based on the calling convention. The PowerPC backend, for example, looks
>>>>> like this:
>>>>>
>>>>>   // Add a register mask operand representing the call-preserved
>>>>> registers.
>>>>>   const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo();
>>>>>   const uint32_t *Mask =
>>>>>       TRI->getCallPreservedMask(DAG.getMachineFunction(), CallConv);
>>>>>   assert(Mask && "Missing call preserved mask for calling
>>>>> convention");
>>>>>   Ops.push_back(DAG.getRegisterMask(Mask));
>>>>>
>>>>> but it can be more complicated. If you look for uses of
>>>>> 'getRegisterMask' in Target/*/*ISelLowering.cpp, you'll see what I mean.
>>>>> Regardless, the code ends up calling some method is the targets
>>>>> TargetRegisterInfo subclass. These methods generally look something like
>>>>> this:
>>>>>
>>>>> const uint32_t *
>>>>> PPCRegisterInfo::getCallPreservedMask(const MachineFunction &MF,
>>>>>                                       CallingConv::ID CC) const {
>>>>>   const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();
>>>>>   ...
>>>>>   return TM.isPPC64() ? (Subtarget.hasAltivec() ?
>>>>> CSR_SVR464_Altivec_RegMask
>>>>>                                                 : CSR_SVR464_RegMask)
>>>>>                       : (Subtarget.hasAltivec() ?
>>>>> CSR_SVR432_Altivec_RegMask
>>>>>                                                 : CSR_SVR432_RegMask);
>>>>> }
>>>>>
>>>>> In any case, the fundamental idea here is that, when someone calls
>>>>> getCallPreservedMask in order to set the regmask on a call, we might not
>>>>> have to use the CC at all. Instead, if we've already codegened the
>>>>> function, we might use a cache of 'exact' register masks computed during
>>>>> codegen of the potential callees instead.
>>>>>
>>>> I am thinking to add a simple Immutable pass MachineRegisterUsageInfo
>>>> similar to MachineBranchProbabilityInfo that can maintain
>>>> RegisterUsageInformation per function. Can it be simply done by using
>>>> UsedPhysRegMask from MachineRegisterInfo ??
>>>>
>>>>
>>>> No, like the comment said, UsedPhysRegMask gives only the registers
>>>> clobbered by calls:
>>>> // This bit vector represents all the registers clobbered by function
>>>> calls.
>>>>
>>>> You want to build this information yourself on top of
>>>> MachineRegisterInfo::isPhysRegModified
>>>>
>>> Ok but then the time complexity will be O(n) n = number of physical
>>> register on the target. Am I going correct?
>>>
>>>
>>> Yes, this is correct.
>>>
>>>
>>>> Here getCallPreservedMask will call API provided by
>>>> MachineRegisterUsageInfo to avail the exact register mask but how it can
>>>> know that the function is already codegen or it will query each time when
>>>> getCallPreservedMask is called and of available MachineRegisterUsageInfo
>>>> will return the details otherwise simply return NULL.
>>>> So changes will be now in TargetRegisterInfo implementation for each
>>>> target right ??
>>>>
>>>>
>>>>> In order to do this, I think we'll need to provide a function callable
>>>>> from the target's getCallPreservedMask implementation, which can return
>>>>> such an 'exact' regmask when available. I think we need to do it this way
>>>>> for two reasons:
>>>>>
>>>>>  1. Not all of the target code calls getCallPreservedMask, but
>>>>> sometimes calls other similar target-specific functions (e.g.
>>>>> getTLSCallPreservedMask).
>>>>>  2. The targets need to opt-in to this behavior because only the
>>>>> target can know that all register uses are really tagged correctly post
>>>>> "pre-emit".
>>>>>
>>>>> Because the target is free to introduce uses of registers at
>>>>> essentially any time, we need to do the scanning for used registers after
>>>>> the "pre-emit" passes run. This can be done by scheduling some simple
>>>>> register-use scanning pass after the call to addPreEmitPass in
>>>>> lib/CodeGen/Passes.cpp.
>>>>>
>>>>>
>>>>> I think this also applies in someway to Mehdi Amini's idea to keep a
>>>>> ModulePass for bookkeeping but then existing register allocators will be
>>>>> required to change so that the code can query the ModulePass for
>>>>> RegMaskBits for particular function.
>>>>>
>>>>> I think that the simplest way to do this is to create an immutable
>>>>> analysis pass (e.g. BasicAA) that keeps the cache of the computed register
>>>>> masks. This is somewhat similar in spirit to how the 'AssumptionCache'
>>>>> analysis works at the IR level. This analysis can then be created by
>>>>> lib/CodeGen/Passes.cpp early, and then queried and passed around later by
>>>>> the CodeGen/Target code. Because it is an immutable analysis, it won't get
>>>>> destroyed until the very end, which is also important because, I imagine,
>>>>> it will need to own the memory associated with the generated register masks.
>>>>>
>>>>>  -Hal
>>>>>
>>>>>
>>>>> Vivek
>>>>>
>>>>>
>>>>>>> Yes for propagating register usage approach we don't need
>>>>>> MachineModulePass
>>>>>>
>>>>>> Vivek
>>>>>>
>>>>>>> --
>>>>>>> Mehdi
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Hal Finkel
>>>>> Assistant Computational Scientist
>>>>> Leadership Computing Facility
>>>>> Argonne National Laboratory
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160525/0182e15d/attachment-0001.html>