[LLVMdev] Interprocedural Register Allocation

Wed Oct 31 13:41:24 PDT 2012

Hi Jakob,
  I have spent last 4 weeks trying to figure out how to implement
Interprocedural Register Allocation. I must admit that I was really
overwhelmed with LLVM's codebase while trying to figure this out :)
There is so much to know! I think I have reached a point where I
have some sort of basic understanding of what needs to be done,
but I need some help from here on. So here is the summary of
what I know and for what I need more help.

I see that lib/CodeGen/Passes.cpp adds PrologEpilogInserter (PEI
henceforth) pass after the RegAlloc pass. Do I understand correctly
that LLVM first performs a given pass on all the functions before
moving to the next pass? In order to proceed further with this analysis
and to reduce the number of email exchanges back and forth, I will
assume that my understanding is correct for now (branch prediction in
real life? :P). Please let me know if I am wrong, I will restart the
analysis from this point in the subsequent emails. Assuming this is the
case, as with many things compiler optimizations, this is somewhat
opposite to what we want :) So let me start from the beginning of what
we want.

Let us say we want to allocate registers for node X in the call graph.
We want our interprocedural register allocator to have run on all the
child nodes of X in the call graph before it starts the register
allocation on the given node X itself. We not only need this, but
we also want all Callee Saved Registers (CSR henceforth)
information to be available for all the child nodes of X.

This is because we want to compute the physical registers available
for node X as:
set(all the physical registers available) - Union over all the
    children(X) (set(CSR for the child node)) -  Union over all the
    children(X) (set(used physical registers including calling
    convention registers for the child node))

However in the current implementation CSR info is calculated in the PEI
pass which is done after register allocation. I am guessing that this
was because, CSR spill insertion is done in the Prolog and the Epilog
in the case when we do not shrink wrap and there was no need to
calculate CSR elsewhere and this was the reasonable location to place
that calculation. However, we have a need to have the calculation done
in the register allocator now. So IMHO, we should be moving the
calculation of CSR out of PEI pass but should save that info to be made
available to PEI pass for spill code insertion. If we can actually do
this, we should be able to combine this CSR calculation with register
allocation. In fact, unless it is absolutely necessary to keep them
separate for modularity or other such reasons, we can merge the PEI pass
and the RegAlloc pass together. What are the cases where one uses a
RegAlloc pass but not PEI pass?

Another concern is about the re-use of CallGraphSCC pass. I see that in
lib/Analysis/IPA/CallGraphSCCPass.cpp's CGPassManager::RunPassOnSCC()
method there is this line, Changed |= FPP->runOnFunction(*F); and FPP
is cast to (FPPassManager*). However, if we want to re-use
CallGraphSCCPass in CodeGen, shouldn't we be casting FPP to the
MachineFunction equivalent and be calling the runOnMachineFunction()
method instead? So should we inherit CallGraphSCCPass to create
MachineCallGraphSCCPass like what we do for FunctionPass, and replace
all the casts/references to non-machine passes to equivalent machine
passes in the overridden methods?

So given all these things, here is my plan on how to go about
implementing an Interprocedural Register Allocator (IPRA henceforth).

1. Decide if we have to create MachineCallGraphSCCPass.

2. If we have to create MachineCallGraphSCCPass, inherit from
   CallGraphSCCPass and override the methods as required.

3. Create a new IPRA pass which inherits from MachineCallGraphSCCPass
   or CallGraphPass depending on whether we create
   MachineCallGraphSCCPass.

4. Implement a loop in runOnMachineSCC that iterates over the CallGraph
   nodes in the bottom-up order:
   A. For each node, run one of the allocators as chosen by the command
      line flag or the optimization level required. This can be
      implemented as an argument to createIPRAPass() method.
   B. While running the register allocator for a node, make use of the
      RegMask from the child nodes to compute the registers available
      for this node.
   C. Immediately after the register allocator is run for the node, run
      calculateCalleeSavedRegisters() on that node and save that info.
      I think the current method already saves this info in MFI, so we
      do not have to do any extra work to save it for later use.
   D. Update the RegMask for the node so that its parents know what
      registers to use.

5. Since calculateCalleeSavedRegisters() is now part of register
   allocator, remove it from PEI pass but use that info where required.

6. Add a command line flag to enable IPRA (turned off by default) and
   add the pass to passes.cpp and the non-IPRAs work exactly as they
   are now from the outside, but calculateCalleeSavedRegisters() is
   moved to a different place.

Other things like, not using the RegMask of the child nodes which have
calls to external functions or have function pointers as function
parameters should be taken care of.

Please let me know if this makes any sense at all. Also if you would
like to have a quicker turn around time for a conversation, I can login
to LLVM's IRC channel when you have some time or please feel free
to ping me on IM (I use this same email ID for GTalk).

On Thu, Oct 4, 2012 at 2:38 PM, Madhusudan C.S <madhusudancs at gmail.com>wrote:

> Hi Jakob,
>
> On Thu, Oct 4, 2012 at 2:31 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote:
>
>>
>> On Oct 4, 2012, at 2:27 PM, "Madhusudan C.S" <madhusudancs at gmail.com>
>> wrote:
>>
>> Basically, the PrologEpilogInsertion pass will add a bit mask to
>>> MachineModuleInfo describing which registers are clobbered by the function
>>> being compiled. Later, when compiling the callers, that bit mask is used to
>>> initialize the regmask operands on call instructions.
>>>
>>
>> So the idea is to sidestep from the calling convention a bit if we
>> already know that the called function will not be using all the
>> registers required by the convention and instead use those registers
>> in the caller?
>>
>>
>> That's right.
>>
>> If I am understanding this correctly, is this something desirable for
>> LLVM, even if it is not high priority?
>>
>>
>> Yes.
>>
>> Would you guys be Ok if I try
>> to implement this without disturbing the project priorities but with
>> a little help/guidance?
>>
>>
>> Absolutely.
>>
>
> Great! Thank you very much. Then, I will do some homework on how
> I plan to implement this, a _very_ rough sketch if not the actual
> design, and come back.
>
> Btw, I just found this http://optimisticcompilers.blogspot.com/ So
> I will just take a look at what happened to that project.
>
>>
>> /jakob
>>
>>
>

-- 
Thanks and regards,
  Madhusudan.C.S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121031/8f1bfda5/attachment.html>