[llvm-commits] [llvm] r122627 - /llvm/trunk/lib/CodeGen/StrongPHIElimination.cpp

Wed Dec 29 16:36:53 PST 2010

On Dec 29, 2010, at 12:16 PM, Jakob Stoklund Olesen wrote:

> On Dec 29, 2010, at 3:00 AM, Cameron Zwarich wrote:
> 
>> Author: zwarich
>> Date: Wed Dec 29 05:00:09 2010
>> New Revision: 122627
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=122627&view=rev
>> Log:
>> Instead of processing every instruction when splitting interferences, only
>> process those instructions that define phi sources. This is a 47% speedup of
>> StrongPHIElimination compile time on 403.gcc.
> 
> Nice!
> 
> How does strong phi elimination affect the runtime of coalescing?

It doesn't reduce it as much as I would like. On 403.gcc, coalescing takes up 4.0-4.1% of compile time with normal PHIElimination, and it goes down to 3.9% with StrongPHIElimination. StrongPHIElimination is getting fast enough that total compile time is just noise, but I'd like to investigate why there isn't as much of a speedup in coalescing as I would expect.

I'm not splitting non-loop critical edges at the moment, because I wanted to test all of the critical edge handling. That will probably lead to fewer copies.

>> +        for (MachineRegisterInfo::def_iterator DI = MRI->def_begin(SrcReg),
>> +             DE = MRI->def_end(); DI != DE; ++DI) {
>> +          PHISrcDefs[DI->getParent()].push_back(&*DI);
> 
> Do these registers have multiple definitions, or could you use MRI->getVRegDef()?

Using getVRegDef() should be fine, since all of the definitions from things like 2-address instructions should be in the same BB. I was going to do that, but I got some confusing results between 3 variants, in increasing order of performance:

1) What I landed, using MOs instead of MIs.

2) What I landed, but only using the MO of the first use (the same use returned by getVRegDef()).

3) What I landed, but using getVRegDef().

It doesn't really make sense to me that 2) should be slower than 3). I guess I'll land 3) and investigate other benchmarks of compile-time. There are a couple of other possible speedups:

1) Sorting the entire basic block isn't really necessary, it should just be enough to do a few comparisons when things interfere.

2) I'm using the simple theoretically optimal union-find. There are faster versions that are slightly less optimal.

3) Getting rid of the second pass of reforming the equivalence classes and renaming registers. This would also make it easier to cut down on the number of new virtual registers. I tried doing this and it is tricky, but it should work.

Cameron