[LLVMdev] [LLVMDev] Phi elimination: Who does what

Tue Oct 5 18:49:36 PDT 2010

Well, let me tell you what I have done so far. First, I set it up as a
problem solution framework, which is great for the furthest first
spill heuristic. The problem holds a sequence of registers. I modeled
the sequence as a structure with the values:
struct seq {
 unsigned register, next, prev, preferred;
 bool live_in, phi_reg, ... everything I could possible want to know.
The caveats come later.
};
The sequence is built as a doubly linked list from the next/prev.
Thus, if a register exists in the sequence, it can point to the next
register in the sequence from the next value. So I have the
vector<seq> and I have some functions getRegister(unsigned index) and
getNext(unsigned index) from the problem. The goal is once I build my
problem, I can then iterate through my problem with a simple for loop
like "for( unsigned i=0,e=problem->size(); i != e; ++i )" and I have
the register value, the next value, etc at my fingertips.

I did not use pointers because the furthest first heuristic is
difficult when it looks around branches. It's great for local
allocation, but when the furthest first hits a conditional branch, it
has at least two paths to look down. A simple recursive search with
cache is useful, but spending time to find a recurrence relation would
be beneficial for performance. Thus, when I look at a different branch
the pointer in that different problem cannot give a time value. So I
use lots of unsigned values as indexes into tables.

Now from this sequence, I can take a pass and build the solution. The
solution checks on every register and adds / spills registers with
it's furthest first heuristic (slightly modified with my own
heuristic).

Ok, so for each machine basic block, I create a problem. From each
problem, I create a solution. The solution happens to be a
pre-coloring algorithm. So each problem starts out with nothing
allocated in the registers. From there, I run the slightly modified
furthest first heuristic until the end of the problem. Unfortunately,
machine basic blocks are not usually large. For instance, how long do
you go without using an 'if' in c++.

This now brings me to my third pass. From here I coalesce between the
problems/solutions which correspond directly to machine basic blocks.

Caveats and Optimizations:
1. Some machine basic blocks jump to only one other block and the
other blocks is only reached from the previous block. So, the problem
eats up both blocks without worrying about the branch.
2. When a virtual register needs to be in a physical register, the
physical register already exists as a precolored register. Precolored
coalesced registers like this can be added to the furthest first
heuristic without loosing performance.

There are quite a few others. I do not fully have an understanding on
copy instructions outside of phi elimination.

I wish I was being paid for this.

Thanks,
Jeff Kunkel

On Tue, Oct 5, 2010 at 8:43 PM, Cameron Zwarich <zwarich at apple.com> wrote:
> For spilling, I plan to use the Hack-Braun generalization of the furthest-first heuristic for SSA:
>
> http://pp.info.uni-karlsruhe.de/uploads/publikationen/braun09cc.pdf
>
> For coloring, there are a few different approaches you can take, e.g. dominator tree scan, puzzle-solving, or a modified graph coloring / coalescing heuristic like IRC. The best quality for the least amount of implementation effort is probably a dominator tree scan with register preferences, so I'll try that first:
>
> http://pp.info.uni-karlsruhe.de/uploads/publikationen/braun10cc.pdf
>
> Cameron
>
> On Oct 5, 2010, at 7:43 PM, Jeff Kunkel wrote:
>
>> The allocator you are building, is it the Hack's and Goos's polynomial
>> time algorithm?
>>
>> On Tue, Oct 5, 2010 at 7:14 PM, Cameron Zwarich <zwarich at apple.com> wrote:
>>> There is nothing that currently handles this properly, as far as I know. If you have a phi
>>>
>>> c = phi(a, b)
>>>
>>> where a, b and c are all assigned distinct stack slots, then copies must be inserted in the predecessor. If registers have already been allocated, then this memory copy might require a temporary register (unless you're on an architecture like x86 that lets you do memory-to-memory copies without using a register).
>>>
>>> In general, each phi block in register allocated code lowers to parallel copies on incoming edges that need to be sequentialized. This is a somewhat different problem than normal phi elimination before register allocation.
>>>
>>> I am also writing an SSA-based register allocator for LLVM; perhaps we can share some code in common passes.
>>>
>>> Cameron
>>>
>>> On Oct 5, 2010, at 6:02 PM, Jeff Kunkel wrote:
>>>
>>>> Aye, between all current register allocators the
>>>> 'AU.addRequiredID(PHIEliminationID);' will cause phi's to be
>>>> eliminated to copies, but this misses the point of my question.
>>>>
>>>> What I am asking, is how does stack know that the value of the
>>>> variable which the resulting value of the phi is currently allocated
>>>> at. For instance take the instruction:
>>>>
>>>> Machine Basic Block (mbb) 12
>>>> reg16666 = phi reg17777 (mbb 10), reg18888 (mbb 11).
>>>>
>>>> Let reg17777 memory location be at the sp+x, and let reg18888 memory
>>>> location be at sp+y. What memory location will reg16666 be pointing
>>>> at? And who tells the stack that it needs to somehow make sure that
>>>> reg16666 contains the value of 17777 when machine basic block 10
>>>> branches to 12?
>>>>
>>>> My current register allocator (I am working on) waits until it has
>>>> finished local machine basic block coloring before coordinating
>>>> between the machine basic blocks. The phi elimination can be done
>>>> effectively at this point. Thus, instead of have the hassle of
>>>> coalescing, I have the hassle of phi elimination, which quite
>>>> honestly, is quite simple as long as I have my i's dotted and t's
>>>> crossed. Hence, my question.
>>>>
>>>> The trouble I foresee is that the reg17777 and reg18888 can both be on
>>>> the stack. Then, at the end of the mbb 11, reg18888 must be stored not
>>>> only in it's own slot, but also in reg16666's slot. I am wondering, do
>>>> I need to communicate this information somehow?
>>>>
>>>> Thanks,
>>>> Jeff Kunkel
>>>>
>>>>
>>>> On Tue, Oct 5, 2010 at 4:35 PM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>>> At the moment, phi elimination happens before register allocation, so there can be no phis between memory locations.
>>>>>
>>>>> Cameron
>>>>>
>>>>> On Oct 5, 2010, at 4:19 PM, Jeff Kunkel wrote:
>>>>>
>>>>>> When doing phi elimination, does one have to communicate with the
>>>>>> stack space at all? The problem I see is two distinctly different
>>>>>> registers may have two distinctly different stack spaces. When these
>>>>>> registers are combined in a phi, the values the registers point to
>>>>>> needs to be moved, combined, or otherwise taken care of. I understand
>>>>>> this is the job of the stack space colorer, but when doing phi
>>>>>> elimination, does one need to tell the stack space colorer that the
>>>>>> phi did exist?
>>>>>>
>>>>>> Thanks,
>>>>>> Jeff Kunkel
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>