[llvm-dev] [GSoC 2016] Interprocedural Register Allocation - Introduction and Feedback

Tue May 24 22:08:02 PDT 2016

On Wed, May 25, 2016 at 10:08 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On May 24, 2016, at 9:17 PM, vivek pandya <vivekvpandya at gmail.com> wrote:
>
> Dear Mentors,
>
> Please help me to understand our plan to implement Interprocedural
> Register allocator by propogating register usage info. While writing this
> mail I am considering all previous discussion over llvm-dev and IRC.
>
> 1) A MachineFunction pass to be executed POST-RA to collect the
> information about the used Registers.
> 2) An Immutable pass which will store reg usage info collected by previous
> pass and return it whenever queried.
> 3) A Target specific MachineFucntion pass that will use the register usage
> info for available for call instrction to achive IPRA. This pass should
> run at PRE-RA.
>
> Relation among above passes:
>
> 1) pass will store info to 2) pass as well use info for call instruction
> found while processing.
>
> 3) pass only requires to query information from 2) pass.
>
>
> Questions
> =========
>
>  Which pass is responsible for load/store of callee saved register, at the
> begining of each function call? And how does it uses RegMask of call
> instruction to generate load/store. I think Intra-procedural register
> allocator is not responsible to generate load/store around the call site.
>
> /- - -> (A) - - -> (D)
> /
> (K)- - ->(T)- - -> (B) - - -> (E)
> \
> \- - -> (C) - - -> (F)
>
> So as per our discussion we would require following passes:
>
> Suppose in given example call graph , register allocation for D is
> completed now we have that information available So 3) pass while
> processing A , it would collect reg usage info for all callees and OR them
> and then it should update A's regmask by going to parant procedure that
> actually calls A ??
>
>
> No, Pass 3) is only looking for every call MI in A and updating the
> associated regmask by replacing it with the information stored in the
> immutable pass.
>
What is the use of that ? Please be more specific.

>
>
> How reg mask details of call D would be used by Register allocator while
> allocating register for A and also not generating load/store for register
> being used by A in body of D as we have callee saved convention.
>
>
> I expect all of that to be handled automatically when updating the regmask.
>
Please elaborate your both comments, what ever you have think of it to be.

>
>
>
> How the pass responsible for generating load/store will optimize for the
> child node of call graph where it does not require to load/store because
> caller will not use register used by callee ? I mean how our IPRA will take
> care of this?
>
>
> I don't understand that.
>
In above example call graph when pass 3) is executed for D it does not do
any work because there is not call to any other function in body of D but
actually in presence of IPRA there is no need to preserve (store/load) any
register in function D ( and also in other such leaf node of call graph).
The reason for the same is A now aware of register usage in D so it will
not use any register which intersect with D's register usage thus D need
not to worry about any thing.
Is this make sense?

>
>
>
> In short I am not much clear with the method for using information to get
> effect of IPRA without modifying Register allocator them self(i.e by
> updating regMask of call instructions).
>
> Also 1) pass and 3) pass are seem to intersecting for their work, for
> example consider while scanning register usage info for T function the
> final register usage info should be <all regs used by T> OR < reg usage
> info A > OR <reg usage info B > OR < reg usage info C > because K should
> not use any register which is used by T, A ,D, B, C, E, F with out
> load/store the relevant paper also discuss this situation and suggest to
> fall back to load/store approach. So as we move to upper region of the call
> graph it is very likely that enough regiseters are not there to allocate.
>
>
> The calling convention (for anything else than internal function) will
> always have some callee-saved registers.
> If we have a deep call-graph of internal-only function, then we may
> generate a lot of spill at the top. Somehow we may have to think about
> driving some heuristic with PGO.
>

This kind of problem is addressed by this paper Register allocation across
procedure and module boundaries - http://dl.acm.org/citation.cfm?id=93551
In this paper authors have tried to eliminate load/store only in call
intensive regions ( contrast to simple bottom up ) by analyzing call
frequency at compile time ( also has facility to plug in Profile info) and
rest of the call graph will follow normal register allocator. But yes we
will think about this later.

I suspect we can think about that a bit later. Let's focus on the simple
> for now.
>
> --
> Mehdi
>
>
>
>
> Please bear with my silly questions.
>
> Sincerely,
> Vivek
>
>
> On Wed, May 25, 2016 at 8:46 AM, vivek pandya <vivekvpandya at gmail.com>
> wrote:
>
>>
>>
>> On Wed, May 25, 2016 at 8:44 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>>>
>>> ------------------------------
>>>
>>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>>> *To: *"Hal Finkel" <hfinkel at anl.gov>
>>> *Cc: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias Braun" <
>>> matze at braunis.de>, "Mehdi Amini" <mehdi.amini at apple.com>, "Quentin
>>> Colombet" <qcolombet at apple.com>
>>> *Sent: *Tuesday, May 24, 2016 9:34:29 PM
>>>
>>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
>>> Introduction and Feedback
>>>
>>>
>>>
>>> On Wed, May 25, 2016 at 3:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>
>>>>
>>>> ------------------------------
>>>>
>>>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>>>> *To: *"Quentin Colombet" <qcolombet at apple.com>
>>>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <
>>>> llvm-dev at lists.llvm.org>, "Matthias Braun" <matze at braunis.de>, "Mehdi
>>>> Amini" <mehdi.amini at apple.com>
>>>> *Sent: *Tuesday, May 24, 2016 1:00:58 PM
>>>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
>>>> Introduction and Feedback
>>>>
>>>> Hello,
>>>>
>>>> I have written following code to check each register if it is used by
>>>> machineFunction or not :
>>>>
>>>> MachineRegisterInfo *MRI = &MF.getRegInfo();
>>>> TargetRegisterInfo *TRI = (TargetRegisterInfo
>>>> *)MF.getSubtarget().getRegisterInfo();
>>>>
>>>> Some reason you can't use a const pointer here?
>>>>
>>> MCRegisterInfo is just used to get conventional name of register for
>>> given target like AX, BX on X86.
>>>
>>>>
>>>> const TargetMachine &TM = MF.getTarget();
>>>> const MCRegisterInfo *MCRI = TM.getMCRegisterInfo();
>>>> DEBUG(dbgs() << "Function Name : " << MF.getName() << "\n");
>>>>
>>>> for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(),
>>>> e = (*TRI).regclass_end(); i != e; i++ ) {
>>>> for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege =
>>>> (*i)->end(); pregi != prege; pregi++ ) {
>>>> DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) << "
>>>> is modified "<< MRI->isPhysRegModified(*pregi) << " \n");
>>>>
>>>> Try isPhysRegUsed.
>>>>
>>> ok
>>>
>>>>
>>>> }
>>>> }
>>>> DEBUG(dbgs() << "\n");
>>>>
>>>> The pass which is executing this code is schedule POST-RA stage but
>>>> this gives me true for all registers i.e in each function all registers are
>>>> being used except EBP and some other similar, Is this a correct way to get
>>>> register usage information ? I think I have made some mistake please help.
>>>>
>>>>
>>>> You might look at the implementation of these functions in
>>>> lib/CodeGen/MachineRegisterInfo.cpp and figure out if they're returning
>>>> true because UsedPhysRegMask.test(PhysReg) is true or because
>>>> reg_nodbg_empty(*AliasReg) is true.
>>>>
>>> Yes that helped now I am getting actual register which have been used by
>>> given function, but a little problem
>>> The updated code is as shown below :
>>> for(TargetRegisterInfo::regclass_iterator i = (*TRI).regclass_begin(), e
>>> = (*TRI).regclass_end(); i != e; i++ ) {
>>> for(TargetRegisterClass::iterator pregi = (*i)->begin(), prege =
>>> (*i)->end(); pregi != prege; pregi++ ) {
>>> for (MCRegAliasIterator AliasReg(*pregi, TRI, true); AliasReg.isValid();
>>> ++AliasReg) {
>>>    if (!MRI->reg_nodbg_empty(*AliasReg)) {
>>>     DEBUG( dbgs() << "Physical Register : " << MCRI->getName(*pregi) <<
>>> " is used "<< MRI->isPhysRegUsed(*pregi) << " \n");
>>>     break; // no need to process more alias
>>>    }
>>>   }
>>> }
>>> }
>>> But here some registers are getting processed with in different classes
>>> (unnecessary processing) Is this only way to iterate through all used
>>> register (using RegClass iterator) ? Is there any way to avoid duplicate
>>> regs?
>>> Of course currently I am just printing but next I am thinking to use a
>>> map to track usage info , in that only distinct register info will be
>>> stored but still due to loop structure I need to iterate through a single
>>> register 3 - 4 times making it time consuming.
>>>
>>> Yes, I believe you can just do:
>>>
>>>   for (unsigned Reg = 0; Reg < TRI->getNumRegs(); ++Reg) {
>>>
>> Oh yes thanks I just forgot that PhyReg starts at 0.
>>
>>>
>>>
>>>  -Hal
>>>
>>>
>>> -Vivek
>>>
>>>>
>>>>
>>>>  -Hal
>>>>
>>>>
>>>>
>>>> Vivek
>>>>
>>>> On Wed, May 18, 2016 at 11:42 PM, Quentin Colombet <qcolombet at apple.com
>>>> > wrote:
>>>>
>>>>>
>>>>> On May 18, 2016, at 11:00 AM, vivek pandya <vivekvpandya at gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> *Vivek Pandya*
>>>>>
>>>>>
>>>>> On Wed, May 18, 2016 at 11:25 PM, Quentin Colombet <
>>>>> qcolombet at apple.com> wrote:
>>>>>
>>>>>>
>>>>>> On May 18, 2016, at 10:46 AM, vivek pandya <vivekvpandya at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Vivek Pandya*
>>>>>>
>>>>>>
>>>>>> On Wed, May 11, 2016 at 4:01 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>>>>>>> *To: *"Mehdi Amini" <mehdi.amini at apple.com>
>>>>>>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "Quentin Colombet" <
>>>>>>> qcolombet at apple.com>, "llvm-dev" <llvm-dev at lists.llvm.org>,
>>>>>>> "Matthias Braun" <matze at braunis.de>
>>>>>>> *Sent: *Wednesday, May 11, 2016 3:15:03 AM
>>>>>>> *Subject: *Re: [GSoC 2016] Interprocedural Register Allocation -
>>>>>>> Introduction and Feedback
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Vivek Pandya*
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 11, 2016 at 10:02 AM, vivek pandya <
>>>>>>> vivekvpandya at gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Vivek Pandya*
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 11, 2016 at 9:43 AM, Mehdi Amini <mehdi.amini at apple.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On May 10, 2016, at 6:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------
>>>>>>>>>
>>>>>>>>> *From: *"vivek pandya" <vivekvpandya at gmail.com>
>>>>>>>>> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Tim Amini Golling" <
>>>>>>>>> mehdi.amini at apple.com>, "Hal Finkel" <hfinkel at anl.gov>
>>>>>>>>> *Cc: *"Quentin Colombet" <qcolombet at apple.com>
>>>>>>>>> *Sent: *Tuesday, May 10, 2016 2:59:16 PM
>>>>>>>>> *Subject: *[GSoC 2016] Interprocedural Register Allocation -
>>>>>>>>> Introduction and Feedback
>>>>>>>>>
>>>>>>>>> Hello LLVM Community,
>>>>>>>>>
>>>>>>>>> Sorry for delay as I was busy in final exams.
>>>>>>>>>
>>>>>>>>> I am Vivek from India. Thanks for choosing my proposal for
>>>>>>>>> Interprocedural Register Allocation (IPRA) in LLVM. Mehdi Amini and Hal
>>>>>>>>> Finkel will be mentoring me for this project.
>>>>>>>>>
>>>>>>>>> IPRA can reduce code size and runtime of programs by allocating
>>>>>>>>> register across the module and procedure boundaries.
>>>>>>>>>
>>>>>>>>> I have identified some old but effective research work on this
>>>>>>>>> area.
>>>>>>>>> I want community's feedback for feasibility of these approach and
>>>>>>>>> I am targeting to implement two of them during this project.
>>>>>>>>>
>>>>>>>>> Here is list of the papers, I have read first two papers and I
>>>>>>>>> would like to discuss those approach first, I will read other two paper
>>>>>>>>> then initiate discussion for them as well. All I want is to find out a
>>>>>>>>> concrete implementation plan before 23 May, 2016 and for that I need
>>>>>>>>> community's help.
>>>>>>>>>
>>>>>>>>> 1) Compile time ----- Minimizing register usage penalty at
>>>>>>>>> procedure calls - http://dl.acm.org/citation.cfm?id=53999
>>>>>>>>> ====================================================================In
>>>>>>>>> this approach intra-procedural register allocation is used as base but
>>>>>>>>> machine code generation order is bottom up traversal of call graph and
>>>>>>>>> inter-procedural effect is achieved by propagating register usage
>>>>>>>>> information of callee function to caller (i.e child to parent in CallGraph)
>>>>>>>>> so that caller can use different registers than callee and can save load
>>>>>>>>> store cost at procedure call, this is not trivial as it seems due to
>>>>>>>>> recursive calls, library function usage etc. Also for upper region of the
>>>>>>>>> graph in this technique available number of registers might become zero in
>>>>>>>>> that case it should fall back to normal load store at procedure call. Apart
>>>>>>>>> from these difficulties other difficulties have been identified please
>>>>>>>>> follow this mail-chain
>>>>>>>>> https://groups.google.com/d/topic/llvm-dev/HOYAXv3m1LY/discussion
>>>>>>>>> My mentor has already provided me a patch that alters code
>>>>>>>>> generation order as per bottom up call graph traversal, I am working from
>>>>>>>>> that point now. Any other help/suggestion is always welcomed.
>>>>>>>>>
>>>>>>>>> 2) Link time ----- Global register allocation at link time -
>>>>>>>>> http://dl.acm.org/citation.cfm?id=989415
>>>>>>>>> ====================================================================In
>>>>>>>>> this particular approach (sort of true IPRA) registers will be reallocated
>>>>>>>>> (this optimization will be optional if turned off still code will be
>>>>>>>>> compiled as per intra-procedural allocation) at link time. Here modules are
>>>>>>>>> first complied as per normal compilation but the object code is annotated
>>>>>>>>> with details so that linker can build call graph and also calculate usage
>>>>>>>>> information at link time. Compiler also write hints in object code that if
>>>>>>>>> particular variable is allocated in some other register ( due to new
>>>>>>>>> allocation) then how the code should be changed? Thus linker can use these
>>>>>>>>> information to decide which variables (global) need to be in same register
>>>>>>>>> through out the program execution and also according to register usage
>>>>>>>>> information in call graph which procedure will not be active simultaneously
>>>>>>>>> so that locals for that procedures can be in same registers with out load
>>>>>>>>> store at procedure calls.
>>>>>>>>> For these particular method help me to analyze feasibility:
>>>>>>>>> 1) Can llvm collects following information at module level in
>>>>>>>>> MachineIR? list of procedures in module, list of locals in procedures, list
>>>>>>>>> of procedures that a particular procedure can call, and a list of the
>>>>>>>>> variables this procedure references. Each entry in the last two lists
>>>>>>>>> includes an estimate of the number of times the procedure is called or the
>>>>>>>>> variable is referenced in each execution of this procedure
>>>>>>>>> 2) Can llvm write informative commands to object files?
>>>>>>>>> 3) Can LTO is capable of leveraging those commands?
>>>>>>>>>
>>>>>>>>> In terms of scoping the project for the summer, I definitely
>>>>>>>>> recommend that you focus on (1) first. If you finish that, we can certainly
>>>>>>>>> move on to other things.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'll add +1 here, but I already wrote the same thing on IRC when
>>>>>>>>> discussing with Vivek. True IPRA without a proper MachineModule
>>>>>>>>> infrastructure won't be doable in my opinion (even with such
>>>>>>>>> infrastructure, it may not be trivial in LLVM in general).
>>>>>>>>>
>>>>>>>>> Regarding link time, note that any such a design would likely look
>>>>>>>>> much different than in David Wall's paper however, because our LTO
>>>>>>>>> re-codegens everything anyway. The paper says, "Finally, it keeps us honest
>>>>>>>>> as designers of the system; once we postpone anything until link time, the
>>>>>>>>> temptation is great to postpone everything, ..." - Well, we've long-since
>>>>>>>>> succumb to that temptation when we LTO. C'est la vie.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +1 as well, our LTO will benefit naturally from the leaf-to-root
>>>>>>>>> information propagation. ThinLTO will be more challenging/interesting
>>>>>>>>> though!
>>>>>>>>>
>>>>>>>>> For the first part a mechanism similar to MachineModulePass would
>>>>>>>>> be desirable but that may not be possible during this project, but if we
>>>>>>>>> can make some sort of smaller version of that to suit our purpose.
>>>>>>>>>
>>>>>>>>> I don't think we need to make any kind of MachineModulePass to
>>>>>>>>> make this work. Once we alter the visitation order based on the CGSCC
>>>>>>>>> iteration scheme, we can keep state in-between functions in the
>>>>>>>>> pre-existing hacky way (using static members of the relevant function
>>>>>>>>> passes).
>>>>>>>>>
>>>>>>>>>  Sorry my mistake here by first part I mean 1) requirement in the
>>>>>>>> link time approach.
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>> I also don't see where/why we need a MachineModule(Pass) for the
>>>>>>>>> CGSCC scheme, that said I'd rather avoid using a function pass with static
>>>>>>>>> members, if we can have a ModuleAnalysis that is bookkeeping the results
>>>>>>>>> for functions in the module and queries by the register allocator somehow.
>>>>>>>>> Matthias/Quentin may have other inputs on this aspect.
>>>>>>>>>
>>>>>>>>
>>>>>>> @Hal do you mean to add a simple MachineFunction pass that will just
>>>>>>> operate on register allocated function and prepare a BitVector to indicate
>>>>>>> which register is being used by MachineFunction, and then use this pass as
>>>>>>> analysis pass (i.e just simply return static BitVector for clobbered
>>>>>>> register when register allocation for next function begins. This part is
>>>>>>> not much clear to me) this thing can be done by scheduling a pass post
>>>>>>> register allocation in lib/CodeGen/Passes.cpp
>>>>>>>
>>>>>>> void TargetPassConfig::addMachinePasses() {
>>>>>>> .
>>>>>>> .
>>>>>>> .
>>>>>>>   // Run pre-ra passes.
>>>>>>>   addPreRegAlloc();
>>>>>>>
>>>>>>>   // Run register allocation and passes that are tightly coupled
>>>>>>> with it,
>>>>>>>   // including phi elimination and scheduling.
>>>>>>>   if (getOptimizeRegAlloc())
>>>>>>>     addOptimizedRegAlloc(createRegAllocPass(true));
>>>>>>>   else
>>>>>>>     addFastRegAlloc(createRegAllocPass(false));
>>>>>>>
>>>>>>>   // Run post-ra passes.
>>>>>>>   addPostRegAlloc();
>>>>>>> // Adding a new pass here which keeps register mask information
>>>>>>> across function calls.
>>>>>>> .
>>>>>>> .
>>>>>>> .
>>>>>>> }
>>>>>>>
>>>>>>> But this also requires current register allocators to use this
>>>>>>> information in someway because RegMaskBits in LiveIntervalAnalysis.cpp is
>>>>>>> not static across calls. I mean I am not clear for how to propagate static
>>>>>>> info to Intra-procedural Register allocators (if possible without
>>>>>>> disturbing their code )
>>>>>>>
>>>>>>> First, my hope is that we won't need to change the register
>>>>>>> allocators, as such, in order to make use of this information. Instead,
>>>>>>> we'll simply be able to alter the register masks generated for the call
>>>>>>> instructions. These masks will indicate fewer clobbers than might otherwise
>>>>>>> be present based on the ABI because of information gathered during the
>>>>>>> codegen of the callee. These masks are generally constructed by target
>>>>>>> based on the calling convention. The PowerPC backend, for example, looks
>>>>>>> like this:
>>>>>>>
>>>>>>>   // Add a register mask operand representing the call-preserved
>>>>>>> registers.
>>>>>>>   const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo();
>>>>>>>   const uint32_t *Mask =
>>>>>>>       TRI->getCallPreservedMask(DAG.getMachineFunction(), CallConv);
>>>>>>>   assert(Mask && "Missing call preserved mask for calling
>>>>>>> convention");
>>>>>>>   Ops.push_back(DAG.getRegisterMask(Mask));
>>>>>>>
>>>>>>> but it can be more complicated. If you look for uses of
>>>>>>> 'getRegisterMask' in Target/*/*ISelLowering.cpp, you'll see what I mean.
>>>>>>> Regardless, the code ends up calling some method is the targets
>>>>>>> TargetRegisterInfo subclass. These methods generally look something like
>>>>>>> this:
>>>>>>>
>>>>>>> const uint32_t *
>>>>>>> PPCRegisterInfo::getCallPreservedMask(const MachineFunction &MF,
>>>>>>>                                       CallingConv::ID CC) const {
>>>>>>>   const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();
>>>>>>>   ...
>>>>>>>   return TM.isPPC64() ? (Subtarget.hasAltivec() ?
>>>>>>> CSR_SVR464_Altivec_RegMask
>>>>>>>                                                 :
>>>>>>> CSR_SVR464_RegMask)
>>>>>>>                       : (Subtarget.hasAltivec() ?
>>>>>>> CSR_SVR432_Altivec_RegMask
>>>>>>>                                                 :
>>>>>>> CSR_SVR432_RegMask);
>>>>>>> }
>>>>>>>
>>>>>>> In any case, the fundamental idea here is that, when someone calls
>>>>>>> getCallPreservedMask in order to set the regmask on a call, we might not
>>>>>>> have to use the CC at all. Instead, if we've already codegened the
>>>>>>> function, we might use a cache of 'exact' register masks computed during
>>>>>>> codegen of the potential callees instead.
>>>>>>>
>>>>>> I am thinking to add a simple Immutable pass MachineRegisterUsageInfo
>>>>>> similar to MachineBranchProbabilityInfo that can maintain
>>>>>> RegisterUsageInformation per function. Can it be simply done by using
>>>>>> UsedPhysRegMask from MachineRegisterInfo ??
>>>>>>
>>>>>>
>>>>>> No, like the comment said, UsedPhysRegMask gives only the registers
>>>>>> clobbered by calls:
>>>>>> // This bit vector represents all the registers clobbered by function
>>>>>> calls.
>>>>>>
>>>>>> You want to build this information yourself on top of
>>>>>> MachineRegisterInfo::isPhysRegModified
>>>>>>
>>>>> Ok but then the time complexity will be O(n) n = number of physical
>>>>> register on the target. Am I going correct?
>>>>>
>>>>>
>>>>> Yes, this is correct.
>>>>>
>>>>>
>>>>>> Here getCallPreservedMask will call API provided by
>>>>>> MachineRegisterUsageInfo to avail the exact register mask but how it can
>>>>>> know that the function is already codegen or it will query each time when
>>>>>> getCallPreservedMask is called and of available MachineRegisterUsageInfo
>>>>>> will return the details otherwise simply return NULL.
>>>>>> So changes will be now in TargetRegisterInfo implementation for each
>>>>>> target right ??
>>>>>>
>>>>>>
>>>>>>> In order to do this, I think we'll need to provide a function
>>>>>>> callable from the target's getCallPreservedMask implementation, which can
>>>>>>> return such an 'exact' regmask when available. I think we need to do it
>>>>>>> this way for two reasons:
>>>>>>>
>>>>>>>  1. Not all of the target code calls getCallPreservedMask, but
>>>>>>> sometimes calls other similar target-specific functions (e.g.
>>>>>>> getTLSCallPreservedMask).
>>>>>>>  2. The targets need to opt-in to this behavior because only the
>>>>>>> target can know that all register uses are really tagged correctly post
>>>>>>> "pre-emit".
>>>>>>>
>>>>>>> Because the target is free to introduce uses of registers at
>>>>>>> essentially any time, we need to do the scanning for used registers after
>>>>>>> the "pre-emit" passes run. This can be done by scheduling some simple
>>>>>>> register-use scanning pass after the call to addPreEmitPass in
>>>>>>> lib/CodeGen/Passes.cpp.
>>>>>>>
>>>>>>>
>>>>>>> I think this also applies in someway to Mehdi Amini's idea to keep a
>>>>>>> ModulePass for bookkeeping but then existing register allocators will be
>>>>>>> required to change so that the code can query the ModulePass for
>>>>>>> RegMaskBits for particular function.
>>>>>>>
>>>>>>> I think that the simplest way to do this is to create an immutable
>>>>>>> analysis pass (e.g. BasicAA) that keeps the cache of the computed register
>>>>>>> masks. This is somewhat similar in spirit to how the 'AssumptionCache'
>>>>>>> analysis works at the IR level. This analysis can then be created by
>>>>>>> lib/CodeGen/Passes.cpp early, and then queried and passed around later by
>>>>>>> the CodeGen/Target code. Because it is an immutable analysis, it won't get
>>>>>>> destroyed until the very end, which is also important because, I imagine,
>>>>>>> it will need to own the memory associated with the generated register masks.
>>>>>>>
>>>>>>>  -Hal
>>>>>>>
>>>>>>>
>>>>>>> Vivek
>>>>>>>
>>>>>>>
>>>>>>>>> Yes for propagating register usage approach we don't need
>>>>>>>> MachineModulePass
>>>>>>>>
>>>>>>>> Vivek
>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mehdi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Hal Finkel
>>>>>>> Assistant Computational Scientist
>>>>>>> Leadership Computing Facility
>>>>>>> Argonne National Laboratory
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160525/341f0fbe/attachment.html>