[PATCH] D39079: New clang option -fno-plt to avoid PLT for external calls

Tue Oct 24 11:12:14 PDT 2017

tmsriram added a comment.

In https://reviews.llvm.org/D39079#905468, @rnk wrote:

> In https://reviews.llvm.org/D39079#905454, @joerg wrote:
>
> > It also increases the pressure on the branch predictor, so it is not really black and white.
>
>
> I don't understand this objection. I'm assuming that the PLT stub is an indirect jump through the PLTGOT, not a hotpatched stub that jumps directly to the definition chosen by the loader. This is the ELF model that I'm familiar with, especially since calls to code more than 2GB away generally need to be indirect anyway.

Yes, this is correct.  A PLT stub for x86_64 looks like this:

  jmpq   *0x2ada(%rip)        # 403000 <_GLOBAL_OFFSET_TABLE_+0x18>
  pushq  $0x0
  jmpq   400510 <_init+0x30>

It has three instructions and the last two are only useful if lazy binding is done. With early binding, the last two instructions is dead code. What this patch does is to take that first instruction and put it at the point where the call is made to the PLT, that's it.  Really, with early binding, the PLT  stub is a completely redundant piece of code.  I can't see how you argue with this.

> 
> 
>> Qt5 tries that. Requires further hacks as the main binary must be compiled as fully position independent code to not run into fun latter. Fun with copy relocations is only part of it.
> 
> I'm not sure I understand, but this patch isn't introducing copy relocations, to be clear.
> 
>> The loader doesn't see GOTPCREL anymore. It also requires a linker that disassembles instructions, because it can't distinguish between a normal pointer load and a call, to be able to optimize it.
> 
> Well, yes. The user needs to know that they have an x86-encoding-aware linker, or using this flag is probably going to slow their code down. From my perspective, this is a performance tuning flag, so that's reasonable.

https://reviews.llvm.org/D39079