[llvm-commits] Enable early dup of any small bb

Fri Jun 10 13:40:00 PDT 2011

On Jun 10, 2011, at 1:05 PM, Rafael Avila de Espindola wrote:

> On 11-06-10 03:46 PM, Bob Wilson wrote:
>> We did  some experiments of early tail dup when we first created the
>> pass and found that it made almost no difference in performance for
>> anything except indirect branches.  Did you benchmark more than just
>> firefox?  If you're seeing cases where there is a significant
>> benefit, and if there are no regressions in code size, code quality
>> or build times, we could consider doing this.  I'd like to see
>> benchmark results across a fairly wide range of tests and on multiple
>> targets before deciding that.
> 
> The only two easy tests I have at hand are firefox and clang iself. Would running the llvm-testsuite be a good measure?

If the llvm-testsuite is all you've got, that would be a good start.  We don't have enough real applications in the testsuite, but I don't know what else to suggest.

> 
> What kind of problems were you having with early dup before? In the cases I looked at the assembly, having an early dup with the same limit as the late one just cleans up the code a bit. It doesn't change a lot which blocks are duplicated, just when. Same idea for why it is good to duplicate indirectbr early, just not as dramatic.

We didn't have any problems with early tail dup.  It just didn't make much difference in code quality and it has at least some negative effect on compile time.

> 
>> In the longer term, I plan to add a separate "indirect branch
>> duplication" pass, and I was hoping that could entirely replace the
>> early tail-dup pass. The current tail duplication pass is not smart
>> enough to do a good job for indirect branches.  In order to get it to
>> do what we need for indirect branches, we had to crank up the
>> duplication limit fairly high, and then it blindly duplicates through
>> multiple blocks, in some cases blowing up code size for no good
>> reason.  There are also cases where it fails because it cannot
>> duplicate a block when a predecessor ends with a conditional branch.
>> I haven't worked on that new pass for a while, but I wouldn't mind
>> getting back to it, especially if you're seeing cases where it's
>> needed.
> 
> The problem I have found have to do with the register allocator being unhappy with the output of early tail dup. As a test, I tried reducing the limit to the very minimum that would duplicate the original blocks and stop. It helped a bit, but the register allocator was still confused. Maybe in a case not as insane as jsinterp.o the decision of what to duplicate is more important.

Maybe I missed some things, but did you ever track down the root cause of the allocator's confusion?  It seems like fixing the allocator would be the right thing to do here, regardless of whether we make other changes to tail dup.

> 
> I guess the most important part is then not so much deciding what to duplicate, but doing a good job at it. I am still reducing a case where doing the duplication at clang produces better results than duplicating just before regalloc (1/2 as many spills).
> 
> What were your plans for the indirectbr duplication pass?

I'd basically like it to "undo" the front-end's merging of indirect branches.  The idea I was working on duplicated regions of code, not just individual blocks, based on dominance info.  Basically each successor of the indirect branch, corresponding to a "case" in a switch statement, dominates a region of code ending with one or more jumps back to the indirect branch, and we should try to duplicate one and only one copy of the indirect branch into each of those regions.