[llvm-commits] Enable early dup for small bb, take2

Mon Jun 13 11:16:17 PDT 2011

On 11-06-13 01:29 PM, Bob Wilson wrote:
> It seems to me that this isn't a clear win.  It helps some cases but
> hurts others.

The cases I have looked at, the hurt is by luck (different registers, 
same code) or because other passes do something silly when given a 
reduced IL. Take a look at the two logs:

http://people.mozilla.com/~respindola/patch.log.bz2
http://people.mozilla.com/~respindola/trunk.log.bz2

It is actually funny :-)

> As you've seen, updating PHIs for tail duplication is tricky.  I'd
> really prefer to avoid that.  If we only run the taildup pass after
> regalloc, we can remove all that complexity.  Something similar would
> still be needed in the separate indirect branch duplication pass
> (that I'm still working on), but at least we wouldn't have to do it
> in taildup as well.
>
> How important do you think it is to do this?  Am I misreading your
> data?
>

I do think it is important. The way I read the data is that there is 
useful cleanup that duplicating small blocks can do. Some passes run 
afterwards can currently make bad decisions on the new input, but that 
is a problem that should be fixed on them.

Ideally, the blocks the early pass is duplicating are the same ones the 
late one would. So this is really just cleaning it up.

One thing that was surprising even to me was the clang became a tiny bit 
faster. I guess because it is passing fewer blocks down the pipeline.

I started looking at this because my old patch (duplicating indirectbr 
in clang) shows that having more cleanup happening from the duplication 
to the register allocator can help firefox.

Note that the speed improvement in firefox was measured in a full js 
benchmark. I can run instruments on it if you are curious on what the 
impact was on the JS interpreter only.

As for correctness, I would argue that it is safer to have code that is 
executed (and therefor tested) more often. The issues I fixed were found 
by increasing the dup size limit to 8 and bootstrapping clang. The bugs 
were there and are real, it is just hard to trigger then with an 
indirectbr only pass (as early dup is right now). When someone does hit 
them, they would have been incredibly harder to debug.

Cheers,
Rafael