rkotler at mips.com
Tue Sep 3 12:05:01 PDT 2013
You the man!
That make total sense.
As you said, .global might prevent the symbol from participating in lazy
binding but I need to investigate this issue thoroughly.
On 09/02/2013 03:29 AM, Richard Sandiford wrote:
> Hi Reed,
> Still catching up on email, so hope this isn't already covered...
> reed kotler<rkotler at mips.com> writes:
>> I have a strange issue that I encountered with mips16 hard float.
>> Part of mips16 hard float is to emit calls to runtime routines with the
>> same signature as usual soft float routines, except that they are
>> implemented using mips32 code which uses real floating point
>> instructions (mips16 processor mode has no hardware floating point
>> These routines have the same names as the corresponding softfloat
>> routines, except with the additional prefix __mips16_ . So for example,
>> For these intrinsics, (and not others), gcc mips16 emits a .globl.
>> Without this .globl ( which llvm does not emit), then the program will
>> run very slow if compiled in -fPIC and linked as C++. It seems to be
>> stuck in the loader (probably doing dynamic binding over and over again).
> This might or might not be related, but I notice that for the attached
> testcase, LLVM emits:
> lui $2, %hi(_gp_disp)
> addiu $2, $2, %lo(_gp_disp)
> addiu $sp, $sp, -32
> .cfi_def_cfa_offset 32
> sw $ra, 28($sp) # 4-byte Folded Spill
> sw $18, 24($sp) # 4-byte Folded Spill
> sw $17, 20($sp) # 4-byte Folded Spill
> sw $16, 16($sp) # 4-byte Folded Spill
> .cfi_offset 31, -4
> .cfi_offset 18, -8
> .cfi_offset 17, -12
> .cfi_offset 16, -16
> addu $16, $2, $25
> move $17, $4
> lw $18, %call16(foo)($16)
> $BB0_1: # %loop
> # =>This Inner Loop Header: Depth=1
> move $25, $18
> jalr $25
> move $gp, $16
> addiu $17, $17, -1
> bnez $17, $BB0_1
> # BB#2: # %exit
> lw $16, 16($sp) # 4-byte Folded Reload
> lw $17, 20($sp) # 4-byte Folded Reload
> lw $18, 24($sp) # 4-byte Folded Reload
> lw $ra, 28($sp) # 4-byte Folded Reload
> jr $ra
> addiu $sp, $sp, 32
> where the %call16 is hoisted out of the loop. It really needs to be
> kept inside the loop and loaded for each iteration. The same goes for
> consecutive calls to the same function; the second call needs to load
> %call16 separately, after the first call has finished.
> As things stand, if foo() hasn't been bound by the time the function
> above is entered, $18 will contain the address of the lazy binding stub,
> and so the loop will try to resolve foo on every iteration. That's
> usually what's happened for me when a testcase gets bogged down in
> the dynamic linker.
> Maybe the lack of .globl is preventing the function from being resolved
> lazily, and so avoids this kind of problem?
> Does removing the .globls from the GCC asm output make any difference?
> Or is it just that adding them to LLVM output makes a difference?
More information about the llvm-dev