[LLVMdev] .globl

Mon Sep 2 03:29:20 PDT 2013

Hi Reed,

Still catching up on email, so hope this isn't already covered...

reed kotler <rkotler at mips.com> writes:
> I have a strange issue that I encountered with mips16 hard float.
>
> Part of mips16 hard float is to emit calls to runtime routines with the 
> same signature as usual soft float routines, except that they are 
> implemented using mips32 code which uses real floating point 
> instructions (mips16 processor mode has no hardware floating point 
> instructions).
>
> These routines have the same names as the corresponding softfloat 
> routines, except with the additional prefix __mips16_ . So for example, 
> __mips16_floatsidf.
>
> For these intrinsics, (and not others), gcc mips16 emits a .globl.
>
> Without this .globl ( which llvm does not emit), then the program will 
> run very slow if compiled in -fPIC and linked as C++. It seems to be 
> stuck in the loader (probably doing dynamic binding over and over again).

This might or might not be related, but I notice that for the attached
testcase, LLVM emits:

	lui	$2, %hi(_gp_disp)
	addiu	$2, $2, %lo(_gp_disp)
	addiu	$sp, $sp, -32
$tmp2:
	.cfi_def_cfa_offset 32
	sw	$ra, 28($sp)            # 4-byte Folded Spill
	sw	$18, 24($sp)            # 4-byte Folded Spill
	sw	$17, 20($sp)            # 4-byte Folded Spill
	sw	$16, 16($sp)            # 4-byte Folded Spill
$tmp3:
	.cfi_offset 31, -4
$tmp4:
	.cfi_offset 18, -8
$tmp5:
	.cfi_offset 17, -12
$tmp6:
	.cfi_offset 16, -16
	addu	$16, $2, $25
	move	$17, $4
	lw	$18, %call16(foo)($16)
$BB0_1:                                 # %loop
                                        # =>This Inner Loop Header: Depth=1
	move	$25, $18
	jalr	$25
	move	$gp, $16
	addiu	$17, $17, -1
	bnez	$17, $BB0_1
	nop
# BB#2:                                 # %exit
	lw	$16, 16($sp)            # 4-byte Folded Reload
	lw	$17, 20($sp)            # 4-byte Folded Reload
	lw	$18, 24($sp)            # 4-byte Folded Reload
	lw	$ra, 28($sp)            # 4-byte Folded Reload
	jr	$ra
	addiu	$sp, $sp, 32

where the %call16 is hoisted out of the loop.  It really needs to be
kept inside the loop and loaded for each iteration.  The same goes for
consecutive calls to the same function; the second call needs to load
%call16 separately, after the first call has finished.

As things stand, if foo() hasn't been bound by the time the function
above is entered, $18 will contain the address of the lazy binding stub,
and so the loop will try to resolve foo on every iteration.  That's
usually what's happened for me when a testcase gets bogged down in
the dynamic linker.

Maybe the lack of .globl is preventing the function from being resolved
lazily, and so avoids this kind of problem?

Does removing the .globls from the GCC asm output make any difference?
Or is it just that adding them to LLVM output makes a difference?

Thanks,
Richard

-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.ll
Type: application/octet-stream
Size: 305 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130902/a31f9544/attachment.obj>