[LLVMdev] Is PIC code defeating the branch predictor?

Tue Jan 4 09:22:18 PST 2011

On Jan 4, 2011, at 4:57 AM, Jonas Maebe wrote:

> 
> On 04 Jan 2011, at 08:30, Jakob Stoklund Olesen wrote:
> 
>> I noticed that we generate code like this for i386 PIC:
>> 
>> 	calll	L0$pb
>> L0$pb:
>> 	popl	%eax
>> 	movl	%eax, -24(%ebp)         ## 4-byte Spill
>> 
>> I worry that this defeats the return address prediction for returns  
>> in the function because calls and returns no longer are matched.
> 
> According to benchmarks by Apple, it's nevertheless faster on modern  
> x86 processors than the trampoline-based alternative (except maybe on  
> Atom, as mentioned in another reply): http://lists.apple.com/archives/perfoptimization-dev/2007/Nov/msg00005.html
> 
> At the time of that post, Apple's version of GCC still generated  
> trampolines (hence the remark). They switched that to the above  
> pattern afterwards.

Right.  All modern X86 processors other than Atom that I'm aware of special case this sequence so it doesn't push an entry onto the return stack predictor.

-Chris