[llvm-commits] Enable early dup for small bb, take2

Bob Wilson bob.wilson at apple.com
Mon Jun 13 10:29:26 PDT 2011


It seems to me that this isn't a clear win.  It helps some cases but hurts others.

As you've seen, updating PHIs for tail duplication is tricky.  I'd really prefer to avoid that.  If we only run the taildup pass after regalloc, we can remove all that complexity.  Something similar would still be needed in the separate indirect branch duplication pass (that I'm still working on), but at least we wouldn't have to do it in taildup as well.

How important do you think it is to do this?  Am I misreading your data?

On Jun 12, 2011, at 10:57 PM, Rafael Ávila de Espíndola wrote:

> I did some benchmarking on my previous patch. A first run building clang itself found an interesting case. For a bb that looks like
> 
> foo:
>  jmp bar
> 
> the patch would duplicate foo to some but not all predecessor. Phi elimination would then add copies too foo and it would stay there. If no duplication was done, branch folding was able fully remove it.
> 
> To solve the problem I added a bit more logic to early dup. It is now also able to duplicate a trivial block to non-trivial ones.
> 
> I did the tests on my home computer, so they are not directly comparable with my previous email. The "time to build clang" is the time needed to build a release version of trunk with both a patched and an unmodified clang binary.
> 
> trunk:
>  time to build clang Release:
>     real	4m28.049s
>     user	30m38.493s
>     sys	1m26.766s
>  size of clang:           25155904
>     real	0m29.864s
>     user	0m29.424s
>     sys	0m0.412s
>  time to build firefox:
>     real	12m54.397s
>     user	31m7.256s
>     sys	3m31.197s
>  size of XUL   48212708
>  dromaeo       1735.08runs/s (Total) http://dromaeo.com/?id=142161
> 
> patch:
>  time to build clang Release:
>     real	4m25.234s
>     user	30m37.794s
>     sys	1m26.153s
>  size of clang            25164096
>     real	0m29.873s
>     user	0m29.433s
>     sys	0m0.409s
>  time to build firefox:
>     real	12m19.259s
>     user	30m56.124s
>     sys	3m11.275s
>  size of XUL   48187036
>  dromaeo       1756.52runs/s (Total) http://dromaeo.com/?id=142162
> 
> The size increase in clang is coming mostly from CGClass.o. With the patch we decide to duplicate a block with one instruction into its only predecessor (and remove the original one). Everything goes fine until branch folding decides to create many copies of a block with 4 instructions in it. I haven't debugged what causes branch folding to change its mind.
> 
> On the llvm testsuite the performance improvements were:
> 
> MultiSource/Benchmarks/McCat/04-bisect/bisect.exec	-11.11%
> MultiSource/Applications/ClamAV/clamscan.exec	-8.33%
> SingleSource/Benchmarks/Shootout-C++/methcall.exec	-1.89%
> SingleSource/Benchmarks/Misc/ffbench.exec	-1.43%
> SingleSource/Benchmarks/BenchmarkGame/nsieve-bits.exec	-1.41%
> SingleSource/Benchmarks/Misc/ReedSolomon.exec	-1.32%
> 
> and the regressions were
> 
> SingleSource/Benchmarks/CoyoteBench/huffbench.exec	17.73%
> SingleSource/Benchmarks/BenchmarkGame/recursive.exec	5.49%
> MultiSource/Benchmarks/Ptrdist/bc/bc.exec	1.59%
> MultiSource/Applications/lua/lua.exec	1.21%
> MultiSource/Benchmarks/SciMark2-C/scimark2.exec	1.10%
> 
> I decided to see what was going on with huffbench. Dumping the machine instructions shows some small blocks being merged, but nothing significant. The final binaries are exactly the same size. Instruments then found the hot spot:
> 
> patch:                                         | samples
> +0x6d0	movq       -12840(%rbp, %rcx, 8), %rsi | 2180
> +0x6d8	incq       %rcx                        | 1966
> +0x6db	cmpq       %rdi, %rsi                  | 447
> 
> 
> trunk:                                         | samples
> +0x6d0	movq       -12840(%rbp, %rdi, 8), %rdx | 1613
> +0x6d8	incq       %rdi                        | 1768
> +0x6db	cmpq       %rax, %rdx                  | 429
> 
> Same instructions at the same address with different registers. I don't know why one is faster than the other...
> 
> Cheers,
> Rafael
> <dup.patch>





More information about the llvm-commits mailing list