[llvm-commits] [patch] Change how clang produces computed gotos

Jakob Stoklund Olesen stoklund at 2pi.dk
Mon Jun 6 12:26:06 PDT 2011


On Jun 6, 2011, at 10:57 AM, Rafael Avila de Espindola wrote:

> I tried compiling firefox with both clang and Apple's gcc. The performance was close. Running it on shark showed that the difference was that the clang compiled one was using 2x as much time in the js interpreter.
> 
> There seems to be a register allocator problem that I hope to report shortly, but another problem is the way the computed gotos are produced.
> 
> Currently clang produces a branch to a common bb that has the only inderctbr in the function. This is normally optimized away, but the optimizers are not kicking in jsinterp.o and it has a single 'jmp *%rax' in the generated binary.
> 
> The main reason for the status quo I think is a gradual evolution from the days there was no indirectbr in llvm. Producing n indirectbr instructions does increase the size of the cfg, but I think the results are worth it:

It was designed this way to keep the mid-level optimizers relatively performant. Duplicating the indirectbr block creates a quadratic number of CFG edges, all critical. There is not much the optimizers can do with such a CFG anyway, other than being really slow.

The backend will duplicate the indirectbr block in its early taildup pass to get the benefits of Markov chain branch prediction.

I am not sure why it fails in this case.

> jsinterp.o goes from 1876288 to 1684448 bytes. The dromaeo score of firefox goes from 1628.59runs/s to 1674.68runs/s. For comparison, gcc is 1667.49runs/s. It is the first time I get firefox to go faster when compiled with clang :-)
> 
> Build time on files that use computed goto does suffer. jsinterp.o with -O3 -g goes from 63s to 140s. Gives that this is one of the two files in all of mozilla that use it, and that it is really performance critical, I think it is worth it.
> 
> A question about indirectbr: I has a list of every label it could branch to. It doesn't look like we optimize it and that we avoid inlining functions that use computed goto. Wouldn't it be better to make the list implicit: every label in this function that has its address taken?

You would have to modify every bit of code that thinks the CFG models control flow.

/jakob




More information about the llvm-commits mailing list