[LLVMdev] .globl

Tue Sep 3 12:05:01 PDT 2013

You the man!

Nice catch.

That make total sense.

As you said, .global might prevent the symbol from participating in lazy 
binding but I need to investigate this issue thoroughly.

http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00975.html

Reed
On 09/02/2013 03:29 AM, Richard Sandiford wrote:
> Hi Reed,
>
> Still catching up on email, so hope this isn't already covered...
>
> reed kotler<rkotler at mips.com>  writes:
>> I have a strange issue that I encountered with mips16 hard float.
>>
>> Part of mips16 hard float is to emit calls to runtime routines with the
>> same signature as usual soft float routines, except that they are
>> implemented using mips32 code which uses real floating point
>> instructions (mips16 processor mode has no hardware floating point
>> instructions).
>>
>> These routines have the same names as the corresponding softfloat
>> routines, except with the additional prefix __mips16_ . So for example,
>> __mips16_floatsidf.
>>
>> For these intrinsics, (and not others), gcc mips16 emits a .globl.
>>
>> Without this .globl ( which llvm does not emit), then the program will
>> run very slow if compiled in -fPIC and linked as C++. It seems to be
>> stuck in the loader (probably doing dynamic binding over and over again).
> This might or might not be related, but I notice that for the attached
> testcase, LLVM emits:
>
> 	lui	$2, %hi(_gp_disp)
> 	addiu	$2, $2, %lo(_gp_disp)
> 	addiu	$sp, $sp, -32
> $tmp2:
> 	.cfi_def_cfa_offset 32
> 	sw	$ra, 28($sp)            # 4-byte Folded Spill
> 	sw	$18, 24($sp)            # 4-byte Folded Spill
> 	sw	$17, 20($sp)            # 4-byte Folded Spill
> 	sw	$16, 16($sp)            # 4-byte Folded Spill
> $tmp3:
> 	.cfi_offset 31, -4
> $tmp4:
> 	.cfi_offset 18, -8
> $tmp5:
> 	.cfi_offset 17, -12
> $tmp6:
> 	.cfi_offset 16, -16
> 	addu	$16, $2, $25
> 	move	$17, $4
> 	lw	$18, %call16(foo)($16)
> $BB0_1:                                 # %loop
>                                          # =>This Inner Loop Header: Depth=1
> 	move	$25, $18
> 	jalr	$25
> 	move	$gp, $16
> 	addiu	$17, $17, -1
> 	bnez	$17, $BB0_1
> 	nop
> # BB#2:                                 # %exit
> 	lw	$16, 16($sp)            # 4-byte Folded Reload
> 	lw	$17, 20($sp)            # 4-byte Folded Reload
> 	lw	$18, 24($sp)            # 4-byte Folded Reload
> 	lw	$ra, 28($sp)            # 4-byte Folded Reload
> 	jr	$ra
> 	addiu	$sp, $sp, 32
>
> where the %call16 is hoisted out of the loop.  It really needs to be
> kept inside the loop and loaded for each iteration.  The same goes for
> consecutive calls to the same function; the second call needs to load
> %call16 separately, after the first call has finished.
>
> As things stand, if foo() hasn't been bound by the time the function
> above is entered, $18 will contain the address of the lazy binding stub,
> and so the loop will try to resolve foo on every iteration.  That's
> usually what's happened for me when a testcase gets bogged down in
> the dynamic linker.
>
> Maybe the lack of .globl is preventing the function from being resolved
> lazily, and so avoids this kind of problem?
>
> Does removing the .globls from the GCC asm output make any difference?
> Or is it just that adding them to LLVM output makes a difference?
>
> Thanks,
> Richard
>