[llvm-commits] Enable early dup for small bb, take2

Wed Jun 15 09:21:20 PDT 2011

>> http://people.mozilla.com/~respindola/patch.log.bz2
>> http://people.mozilla.com/~respindola/trunk.log.bz2
>>
>> It is actually funny :-)
>
> I looked but I'm not sure what to look for.  There are huge
> differences but most of them are just block renumbering.

What is happening is:

*) early dup sees

foo:
   jmp bar

bar:
   jmp zed

zed:
   ret

and that is the only use of bar. It does the right thing and makes foo 
jump directly to zed. That makes zed have 1 predecessor more than the 
branch folding limit for merging duplicated tails, and it fails to merge 
100s of identical bbs (they are created by an IL pass).

I fixed the all or nothing nature of branch folding, but had no time to 
run the benchmarks again. I hope to do that tonight.

>> I do think it is important. The way I read the data is that there
>> is useful cleanup that duplicating small blocks can do. Some passes
>> run afterwards can currently make bad decisions on the new input,
>> but that is a problem that should be fixed on them.
>>
>> Ideally, the blocks the early pass is duplicating are the same ones
>> the late one would. So this is really just cleaning it up.
>
> Well, ideally, if the early and late passes are duplicating the same
> code, then we should get the same results.  Now we know that isn't
> true for register allocation, at least with linear scan, but it is a
> nice goal.  I wonder if there are other things besides that going
> on.

There are lot more passes after the early one, so it exposes more 
optimizations opportunities. The register allocator in particular. The 
example about a load being folded in the jmp, applies to instructions 
other than fold for example.

> So, again, my preference is to work toward eliminating all the tricky
> phi-updating code in taildup (assuming that we end up with a separate
> and more general version of that code in an indirect branch
> duplicating pass).  If we're not going to do that, maybe the best
> thing would be to generalize the phi-updating code into a separate
> "duplicate code region" utility that could be used for both tail dup
> and indirect branch dup.
>
> It sounds like either way I need to get back to working on my
> indirect branch dup pass.....

A pass that handles only indirect jmp is not more general. Let me 
reiterate that the problem llvm is having in this file is not that the 
indirectbr are not being duplicated *or* that too many are. I reduced 
the threshold to exactly the size of the bb with the indirect br. In 
this setting, that is *exactly* what your patch would do, and my patch 
still improved the situation.

Cheers,
Rafael