[LLVMbugs] [Bug 776] NEW: LLVM register allocator issues

bugzilla-daemon at cs.uiuc.edu bugzilla-daemon at cs.uiuc.edu
Fri May 12 14:16:48 PDT 2006


           Summary: LLVM register allocator issues
           Product: libraries
           Version: trunk
          Platform: All
        OS/Version: MacOS X
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Register Allocator
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: evan.cheng at apple.com

There are a number of register allocator issues that we will need to deal with

Example 1:

cond_next98 (0x890d850, LLVM BB @0x8908ff0):
	MOV32mi %ESP, 1, %NOREG, 0, 0
	CALLpcrel32 <ga:int_cst_value>
	%reg1027 = MOV32rr %EAX
	%reg1028 = MOV32rr %EDX
	%reg1029 = MOV32rm %NOREG, 1, %NOREG, 0
	%reg1036 = MOVZX32rm8 %reg1029, 1, %NOREG, 12
	%reg1037 = MOV32rm %NOREG, 1, %NOREG, <ga:tree_code_type>
	%reg1038 = MOV8ri 252
	%reg1039 = ADD8rm %reg1038, %reg1037, 4, %reg1036, 0
	CMP8ri %reg1039, 6
	JB mbb<cond_next170,0x890d970>
    Successors according to CFG: 0x890d910 0x890d970

LBB1_8:	#cond_next98
	movl $0, (%esp)
	call L_int_cst_value$stub
	movl 0, %ecx
	movl %ecx, 72(%esp)
	movzbl 12(%ecx), %ecx
	movl %ecx, 84(%esp)
	movl L_tree_code_type$non_lazy_ptr, %ecx
	movl %ecx, 88(%esp)
	movb $252, 83(%esp)
	movb 83(%esp), %bl
	movl 84(%esp), %edi
	addb (%ecx,%edi,4), %bl
	movb %bl, 83(%esp)
	cmpb $6, %bl
	jb LBB1_10	#cond_next170

Obviously rematerialization will fix eliminate the first movb $252, 83(%esp).
The second can be fixed with LR spliting.

We should have been smarter about picking the right registers for reg1036, 1037.
Their live ranges intercepts with R8, so we should have picked from the
registers that don't conflicts with them.

Example 2:

long %test(long %x, short %y) {
	%tmp = cast short %y to ubyte		; <ubyte> [#uses=1]
	%tmp1 = shr long %x, ubyte %tmp		; <long> [#uses=1]
	ret long %tmp1

	subl $8, %esp
	movl %esi, 4(%esp)
	movl %ebx, (%esp)
	movl 12(%esp), %eax
	movb 20(%esp), %bl
	movl 16(%esp), %esi
	movb %bl, %cl
	shrdl %cl, %esi, %eax
	movb %bl, %cl
	movl %esi, %edx
	sarl %cl, %edx
	sarl $31, %esi
	testb $32, %bl
	cmovne %edx, %eax
	cmovne %esi, %edx
	movl (%esp), %ebx
	movl 4(%esp), %esi
	addl $8, %esp

We are unable to coalesce result of the load "movb 20(%esp)" to cl. This is bug 687.

Example 3:

	%tmp5 = load uint* %Bits		; <uint> [#uses=1]
	%tmp6 = cast uint %tmp4 to ubyte		; <ubyte> [#uses=2]
	%tmp7 = shl uint %tmp5, ubyte %tmp6		; <uint> [#uses=1]
	%tmp9 = xor ubyte %tmp6, 16	

	movl %esi, %ecx
	movl L_Bits$non_lazy_ptr, %edi
	movl (%edi), %ebx
	movl %ebx, 12(%esp)
	movb %cl, %bl
	movb %bl, %cl
	shll %cl, 12(%esp)
	xorb $16, %bl

Truncate (as well as anyext) should be treated specially by the register
allocator. So its live range can conflict with the source live range.

Example 4:

float foo(int *x, float *y, unsigned c) {
  float res = 0.0;
  unsigned i;
  for (i = 0; i < c; i++) {
    float xx = (float)x[i];
    xx = xx * y[i];
    xx += res;
    res = xx;
  return res;

LBB_foo_3:      # no_exit
        cvtsi2ss %XMM0, DWORD PTR [%EDX + 4*%ESI]
        mulss %XMM0, DWORD PTR [%EAX + 4*%ESI]
        addss %XMM0, %XMM1
        inc %ESI
        cmp %ESI, %ECX
****    movaps %XMM1, %XMM0
        jb LBB_foo_3    # no_exit

We need to teach the coalescer to commute 2-addr instructions, allowing us to
eliminate the reg-reg copy in this example:

There is also bug 770. We need live range splitting or be smarter about when to
join a live range with another that is targetting a narrower register class.

