[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

Michael Clark via llvm-dev llvm-dev at lists.llvm.org
Wed Jun 14 14:47:23 PDT 2017


> On 15 Jun 2017, at 6:50 AM, Louis Gerbarg <lgerbarg at apple.com> wrote:
> 
>> 
>> On Jun 6, 2017, at 4:08 PM, Michael Clark via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Hi Folks,
>> 
>> I’m working on a port of musl libc to macos (arch triple is “x86_64-xnu-musl”) to solve some irreconcilable issues I’m having with libSystem.dylib. I don’t want to use glibc for various reasons, mainly because I want to static link. I have static PIE + ASLR working which is not actually supported by the Apple toolchain (*1), but I managed to get it to work. I’m sure Apple might say “Don’t do that”, but from looking at the history of the xnu kernel ABI, it seems to be very stable between versions.
> 
> I am from Apple, and I will say “Don’t do that.” The kernel ABI for our platforms is not stable, we only guarantee stability at the dynamic link boundary (in this case public symbols exported from libSystem). While the kernel syscall numbers have not changed (though the kernel team reserves the right to do that), the parameter lists and argument marshaling for them certainly has changed. We also do not support static executables on our system.
> 
> We even had bincompat issues related to this i rolled their during the last major release (macOS 10.12 Sierra): Go implemented its own syscall support, which caused all of their binaries that used gettimeofday since the internal interface changed <https://github.com/golang/go/issues/16606 <https://github.com/golang/go/issues/16606>>. More broadly you can look at a discussion of their issues here: <https://github.com/golang/go/issues/16606 <https://github.com/golang/go/issues/16606>>. In their case they want to avoid invoking an external linker like ld64 or lld as opposed to avoiding libSystem, but the effect is the same, they shipped a tool that caused unsuspecting developers to have surprise bincompat issues that were entirely avoidable.

I’m aware that Go has had some issues with the XNU ABI boundary.

I’m working on a CPU simulator / binary translator and I need control of the process address space layout. It seems I may ultimately need to use Hypervisor.framework however that is a lot more work in the short term.

The issue I am having with libSystem.dylib is the lack of weak linkage (versus weak_import) i.e. weak aliases. I don’t want to use a wrapper binary with DYLD_INSERT_LIBRARIES. I want to interpose Libc symbols with some of the symbols present in my binary (memory allocator, mmap). Interposition support is somewhat lacking in the Mach-O toolchain and runtime linker despite the Mach-O format technically supporting what I need (N_INDR and N_WEAK_DEF).

- https://developer.apple.com/documentation/kernel/nlist_64 <https://developer.apple.com/documentation/kernel/nlist_64>

If I could use N_INDR and N_WEAK_DEF to have early bound (runtime link time) interposition with symbols in my binary replacing the C library allocator and mmap, and have libSystem use my implementations then I would be happy. libSystem itself would need to use weak aliases. This is possible with C libraries on other platforms.

I’ve tried relentlessly to intercept the malloc_zone implementation. malloc_zone_register is not sufficient as some of the internal zones are tied to the internals of Libc and I am getting heap collisions with Libc allocated objects and my guest address space. On Linux I have enough control to do what I need and can interpose my symbols to implement versions of libc functions that I wish to override. The problem on darwin is that I am not able to interpose the malloc implementation until main starts, and at that point it is tool late as the C library already has created its internal zones. I’m also unable to interpose mmap. I have already looked at the interpose symbol tricks but they don’t meet my purposes  (not wanting to re-exec with DYLD_INSERT_LIBRARIES). Weak aliases from libSystem to the allocator implementation and various public symbols along with N_INDR and N_WEAK_DEF would be required for me to achieve what I need to achieve (somewhat similarly to the elegant internal implementation of musl libc).

With my current solution (musl on xnu) I have successfully reserved 0x1000 – 0x7fff_0000_0000. Essentially the low 128TiB minus 4GiB at the top of the address space where I place my translator and translator stack. This is satisfactory for my user mode simulator to emulate Linux processes on macOS.

I think Hypervisor.framework is probably the correct interface to be using if I want to avoid the kernel ABI, however that is a lot more work that making syscall wrappers and I would need to implement communication from VM process to the host process.

I’m actually implementing Linux syscall emulation in a user simulator so the kernel ABI is probably the technically correct layer. The full system emulator ultimately needs to use Hypervisor.framework if I am to use hardware paging instead of soft MMU. I have two simulators, a user-mode sim that emulates the Linux ABI and a full system emulator: https://rv8.io/ <https://rv8.io/> and I really want to support RISC-V Linux on macOS in the user mode simulator.

Proper Linux ABI emulation on macOS would ultimately require kernel support, at minimum something like binfmt misc, but ideally a kext that implements another ABI personality (much like Linux ABI emulation on Windows) in addition to the BSD personality. In fact the FreeBSD linux compat could be used if the FreeBSD portion of XNU is synced up with current, and we’d get bug fixes for long standing issues like the macOS TCP_NOPUSH bug that has long since been fixed in FreeBSD.

> Ultimately if you are doing this on your own for your own for fun thats great, but if this something you intend to ship to other people please reconsider. It is more than a theoretical concern that it will break.
> 
> Louis
> 
>> In any case the musl libc source makes extensive use of weak aliases, perhaps to allow easier interposition of C library routines, however aliases, weak or otherwise are not currently supported by ld64.
>> 
>> It appears that the mach-o format supports aliases, but the functionality has not been exposed via the linker (ld64/LLD).
>> 
>> - http://blog.omega-prime.co.uk/?p=121 <http://blog.omega-prime.co.uk/?p=121>
>> 
>> The musl code does the following which currently errors out saying aliases are not currently supported:
>> 
>> #undef weak_alias
>> #define weak_alias(old, new) \
>>         extern __typeof(old) new __attribute__((weak, alias(#old)))
>> 
>> and the macro is used internally like this:
>> 
>> int __pthread_join(pthread_t t, void **res)
>> {
>>         // implementation here
>> }
>> 
>> weak_alias(__pthread_join, pthread_join);
>> 
>> The problem is the actual export used by clients is an alias and I want to maintain source compatibility.
>> 
>> I seem to have found a way to semi-emulate aliases (at least within one module). My goal is to at least turn them into strong aliases somehow, so I can at a minimum make the musl source compatible with clang on macos. The following compiles but foo is not exported:
>> 
>> $ cat a.c
>> #include <stdio.h>
>> 
>> void foo() __attribute__((weak_import)) __asm("_bar");
>> 
>> void bar()
>> {
>>         printf("bar\n");
>> }
>> 
>> int main()
>> {
>>         foo();
>> }
>> 
>> $ cc -c a.c -o a.o
>> $ nm a.o
>> 0000000000000000 T _bar
>> 0000000000000020 T _main
>>                  U _printf
>> 
>> Any ideas how I can get foo as an exported symbol? 
>> 
>> Is weak alias or plan alias support planned for mach-o in LLD?
>> 
>> The goal at a minimum is to make the weak_alias macro emit a strong alias with clang/ld64 or clang/LLD? so I don’t need to diverge too much from the upstream musl source (as the lack of alias support currently requires me to rename function declarations in the source). Of course pthreads which I’m working on now are going to be completely different… but musl has support for architecture specific overrides in its build system.
>> 
>> BTW I now have some quite non-trivial programs compiling against musl-xnu + libcxx + libcxxabi on macos.
>> 
>> There are a lot of libcxx changes like this:
>> 
>> -#ifdef __APPLE__
>> +#if defined(__APPLE__) && !defined(_LIBCPP_HAS_MUSL_LIBC)
>> 
>> Michael.
>> 
>> [1] https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99 <https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99>_______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/bfd61013/attachment.html>


More information about the llvm-dev mailing list