[LLVMdev] Is PIC code defeating the branch predictor?

Jonas Maebe jonas.maebe at elis.ugent.be
Tue Jan 4 04:57:45 PST 2011


On 04 Jan 2011, at 08:30, Jakob Stoklund Olesen wrote:

> I noticed that we generate code like this for i386 PIC:
>
> 	calll	L0$pb
> L0$pb:
> 	popl	%eax
> 	movl	%eax, -24(%ebp)         ## 4-byte Spill
>
> I worry that this defeats the return address prediction for returns  
> in the function because calls and returns no longer are matched.

According to benchmarks by Apple, it's nevertheless faster on modern  
x86 processors than the trampoline-based alternative (except maybe on  
Atom, as mentioned in another reply): http://lists.apple.com/archives/perfoptimization-dev/2007/Nov/msg00005.html

At the time of that post, Apple's version of GCC still generated  
trampolines (hence the remark). They switched that to the above  
pattern afterwards.


Jonas



More information about the llvm-dev mailing list