[LLVMdev] Is PIC code defeating the branch predictor?

Tue Jan 4 04:57:45 PST 2011

On 04 Jan 2011, at 08:30, Jakob Stoklund Olesen wrote:

> I noticed that we generate code like this for i386 PIC:
>
> 	calll	L0$pb
> L0$pb:
> 	popl	%eax
> 	movl	%eax, -24(%ebp)         ## 4-byte Spill
>
> I worry that this defeats the return address prediction for returns  
> in the function because calls and returns no longer are matched.

According to benchmarks by Apple, it's nevertheless faster on modern  
x86 processors than the trampoline-based alternative (except maybe on  
Atom, as mentioned in another reply): http://lists.apple.com/archives/perfoptimization-dev/2007/Nov/msg00005.html

At the time of that post, Apple's version of GCC still generated  
trampolines (hence the remark). They switched that to the above  
pattern afterwards.

Jonas