[LLVMdev] Is PIC code defeating the branch predictor?
Chris Lattner
clattner at apple.com
Tue Jan 4 09:22:18 PST 2011
On Jan 4, 2011, at 4:57 AM, Jonas Maebe wrote:
>
> On 04 Jan 2011, at 08:30, Jakob Stoklund Olesen wrote:
>
>> I noticed that we generate code like this for i386 PIC:
>>
>> calll L0$pb
>> L0$pb:
>> popl %eax
>> movl %eax, -24(%ebp) ## 4-byte Spill
>>
>> I worry that this defeats the return address prediction for returns
>> in the function because calls and returns no longer are matched.
>
> According to benchmarks by Apple, it's nevertheless faster on modern
> x86 processors than the trampoline-based alternative (except maybe on
> Atom, as mentioned in another reply): http://lists.apple.com/archives/perfoptimization-dev/2007/Nov/msg00005.html
>
> At the time of that post, Apple's version of GCC still generated
> trampolines (hence the remark). They switched that to the above
> pattern afterwards.
Right. All modern X86 processors other than Atom that I'm aware of special case this sequence so it doesn't push an entry onto the return stack predictor.
-Chris
More information about the llvm-dev
mailing list