[llvm-commits] Enable early dup of any small bb

Bob Wilson bob.wilson at apple.com
Fri Jun 10 12:46:59 PDT 2011


We did  some experiments of early tail dup when we first created the pass and found that it made almost no difference in performance for anything except indirect branches.  Did you benchmark more than just firefox?  If you're seeing cases where there is a significant benefit, and if there are no regressions in code size, code quality or build times, we could consider doing this.  I'd like to see benchmark results across a fairly wide range of tests and on multiple targets before deciding that.

In the longer term, I plan to add a separate "indirect branch duplication" pass, and I was hoping that could entirely replace the early tail-dup pass. The current tail duplication pass is not smart enough to do a good job for indirect branches.  In order to get it to do what we need for indirect branches, we had to crank up the duplication limit fairly high, and then it blindly duplicates through multiple blocks, in some cases blowing up code size for no good reason.  There are also cases where it fails because it cannot duplicate a block when a predecessor ends with a conditional branch. I haven't worked on that new pass for a while, but I wouldn't mind getting back to it, especially if you're seeing cases where it's needed.

As you saw, tail dup for indirect branches has to happen before reg alloc to get good results.

On Jun 10, 2011, at 12:16 PM, Rafael Ávila de Espíndola wrote:

> My idea when I started working on PR10096 was to swap the responsibilities of early and late tail dup. The early one would duplicate small blocks and the late one would be the one responsible for duplicating the "large" blocks with indirectbr.
> 
> Unfortunately, we cannot really depend only on the late pass to handle indirectbr. In the firefox case, gcc produces code that looks like
> 
> 	movq	(%rbx,%r12,8), %rax
>        <code for this switch case>
> 	jmp	*%rax
> 
> clang changed to produce multiple indirectbr instructions produces
> 	<code for this switch case>
> 	jmpq    *(%rsi,%rdx,8)
> 
> and clang with only late tail dup (or when it gets lost in ra) produces
> 
> 	leaq	(%rax,%rcx,8), %rax
> 	<code for this switch case>
> 	jmpq	*(%rax)
> 
> So it looks like we really have to fix the taildup,phielim,ra interaction.
> 
> The good news is that during benchmarking the options I found some bugs when early tail dup is enabled for bb without indirectbr in them. I fixed them an did some more benchmarking by just enabling it.
> 
> I got some interesting results. The build time stayed the same, XUL 64 bits goes from 48745892 to 48735612 and firefox gets a bit faster 1550.87runs/s to 1627.12runs/s in dromaeo in a current build.
> 
> I think the size reduction is because the blocks that we are duplicating early would be duplicated late anyway, we just have more passes to clean it up after us. Since this is just adding small blocks, the RA is still able to do a good job.
> 
> In summary, is the attached patch OK? :-)
> 
> Cheers,
> Rafael
> <dup.patch>





More information about the llvm-commits mailing list