[LLVMdev] Is PIC code defeating the branch predictor?

Mon Jan 3 23:30:54 PST 2011

I noticed that we generate code like this for i386 PIC:

	calll	L0$pb
L0$pb:
	popl	%eax
	movl	%eax, -24(%ebp)         ## 4-byte Spill

I worry that this defeats the return address prediction for returns in the function because calls and returns no longer are matched.

From Intel's Optimization Reference Manual:

"The return address stack mechanism augments the static and dynamic predictors to optimize specifically for calls and returns. It holds 16 entries, which is large enough to cover the call depth of most programs. If there is a chain of more than 16 nested calls and more than 16 returns in rapid succession, performance may degrade.

[...] To enable the use of the return stack mechanism, calls and returns must be matched in pairs. If this is done, the likelihood of exceeding the stack depth in a manner that will impact performance is very low.

[...] Assembly/Compiler Coding Rule 4. (MH impact, MH generality) Near calls must be matched with near returns, and far calls must be matched with far returns. Pushing the return address on the stack and jumping to the routine to be called is not recommended since it creates a mismatch in calls and returns."

Is this a known issue or a non-issue?

An alternative approach would be:

	calll get_eip
	movl	%eax, -24(%ebp)         ## 4-byte Spill
...
get_eip:
	movl (%esp), %eax
	ret

More here: http://software.intel.com/en-us/blogs/2010/10/25/zero-length-calls-can-tank-atom-processor-performance/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1929 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110103/3f091e60/attachment.bin>