[llvm-dev] [llvm-mc] FreeBSD kernel module performance impact when upgrading clang

Mon Nov 2 11:00:15 PST 2020

Hi,

I'm in the process of migrating from clang5 to clang10. Unfortunately clang10 introduced a negative performance impact. The cause is an increase of PLT entries from this patch (first released in clang7):

https://bugs.llvm.org/show_bug.cgi?id=36370
https://reviews.llvm.org/D43383

If I revert that clang patch locally, the additional PLT entries and the performance impact disappear.

This occurs in the context of FreeBSD kernel modules. Using the example code from this page:

https://www.freebsd.org/doc/en_US.ISO8859-1/books/arch-handbook/driverbasics-kld.html

...I can explain what I'm seeing. If I compare the objects generated by clang5 and clang10 for the example:

clang5:

	$ objdump -r skeleton.o

	skeleton.o:     file format elf64-x86-64

	RELOCATION RECORDS FOR [.text]:
	OFFSET           TYPE              VALUE
	0000000000000019 R_X86_64_32S      .rodata.str1.1+0x0000000000000015
	0000000000000024 R_X86_64_32S      .rodata.str1.1+0x000000000000002b
	000000000000002b R_X86_64_PC32     uprintf-0x0000000000000004
	[...]

clang10:

	$ objdump -r skeleton.o

	skeleton.o:     file format elf64-x86-64

	RELOCATION RECORDS FOR [.text]:
	OFFSET           TYPE              VALUE
	0000000000000017 R_X86_64_32S      .rodata.str1.1+0x000000000000002b
	0000000000000020 R_X86_64_32S      .rodata.str1.1+0x0000000000000015
	0000000000000029 R_X86_64_PLT32    uprintf-0x0000000000000004
	[...]

The relocation for the external uprintf call is changed from R_X86_64_PC32 to R_X86_64_PLT32.

Normally, amd64/x86 kernel modules are relocatable object files (via ld -r). Because of that, D43383 typically has no impact as the FreeBSD loader sees the relocations directly and treats R_X86_64_PC32 and R_X86_64_PLT32 the same:

https://github.com/freebsd/freebsd/blob/master/sys/amd64/amd64/elf_machdep.c#L321

But in my case, the kernel objects are created as shared objects. Using shared objects is atypical for amd64, but done for every other architecture except mips:

https://github.com/freebsd/freebsd/blob/master/sys/conf/kmod.mk#L81

The comments in the D43383 review suggest that a modern linker should reduce the PLT32 relocations to PC32 for local calls. But I do not see that reduction even when testing this and other examples with lld 10. My understanding is this is due to the shared kernel objects. The relocations are being processed (and left as PLT) prior to the kernel loader ever seeing them. Unfortunately this means many calls that previously did not go through the PLT now do.

Note that allowing R_X86_64_PC32 within shared objects (without -fPIC) requires a linker patch. This works within a kernel environment even if it should be disallowed elsewhere. But it reveals the larger question raised by the patch and its impact: whose responsibility should this behavior be?

It seems the linker/lld should supply an equivalent of -mcmodel=kernel, e.g. indicating 64-bit pointers will fit in a 32-bit address space. (Stated another way: it seems appropriate to allow users to 32-bit sign extend relocations in shared libraries if they specify some sort of kernel mode.)

From there though, is the linker the place to eliminate PLT relocations for this use case?

Or should the compiler be the one to specify the "right" relocations, meaning the D43383 patch should be modified to emit a different relocation for -mcmodel=kernel?

In summary:

1. Could you please clarify for me the conditions under which the PLT->PC relocation reduction should occur?
2. Given the goal of eliminating unneeded PLT entries from shared kernel objects: should the linker or the compiler be responsible for doing the right thing?

Thanks,

Justin