[LLVMdev] llvm register reload/spilling around calls

Wed Oct 20 16:31:06 PDT 2010

(repost with right sender address)

On 20.10.2010 18:13, Jakob Stoklund Olesen wrote:
> On Oct 20, 2010, at 7:46 AM, Roland Scheidegger wrote:
> 
>> On 20.10.2010 05:00, Jakob Stoklund Olesen wrote:
>>> Look in X86InstrControl.td. The call instructions are all prefixed
>>> by:
>>>
>>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
>>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
>>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
>>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>>>
>>> This is the fixed list of call-clobbered registers. It should really
>>> be controlled by the calling convention of the called function
>>> instead.
>>>
>>> The WINCALL* instructions only exist because of this.
>> Ahh I see now. I hacked this up and indeed the code looks much better.
>> I can't force it to use win64 calling conventions right?
> 
> No, only by targeting Windows.
> 
>> Would do just fine for this case (much closer to a cold calling
>> convention, I really only need 5 preserved xmm regs).
> 
> If XMM registers are the problem, -pre-alloc-split really ought to help you.
> 
> You may want to investigate why it doesn't.

Ok, I see if I can figure out something, though I have no in-depth
knowledge of llvm.
I think only xmm regs are really a problem because r12-r15 are
callee-saved and hence used for holding the most frequently used values,
which seems to be enough to avoid spilling there.

It looked to me like it could be related to something mentioned in the
lib/Target/README.txt file:

//===---------------------------------------------------------------------===//

We should investigate an instruction sinking pass.  Consider this silly
example in pic mode:

#include <assert.h>
void foo(int x) {
  assert(x);
  //...
}

we compile this to:
_foo:
	subl	$28, %esp
	call	"L1$pb"
"L1$pb":
	popl	%eax
	cmpl	$0, 32(%esp)
	je	LBB1_2	# cond_true
LBB1_1:	# return
	# ...
	addl	$28, %esp
	ret
LBB1_2:	# cond_true
...

The PIC base computation (call+popl) is only used on one path through the
code, but is currently always computed in the entry block.  It would be
better to sink the picbase computation down into the block for the
assertion, as it is the only one that uses it.  This happens for a lot of
code with early outs.

Another example is loads of arguments, which are usually emitted into the
entry block on targets like x86.  If not used in all paths through a
function, they should be sunk into the ones that do.

In this case, whole-function-isel would also handle this.

//===---------------------------------------------------------------------===//

Though maybe that's not related, since the arguments are actually
(mostly) used in all paths.

Roland