[llvm-dev] __attribute__((apple_abi)): targeting Apple/ARM64 ABI from Linux (and others)

Martin Storsjö via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 8 02:13:15 PDT 2020


For the record, I've spent a nontrivial amount of time on the ARM64 
version of Wine, and back in the day started out by implementing the 
ms_abi attribute for aarch64 just to get the handling of printf like 
functions right - dealing with (to some extent) most of the same issues 
you're dealing with here.

(Also, as a side comment; the existing names "win64cc", CC_Win64 or 
"IsWin64" used in a number of places, are a bit misnamed in the current 
scope. For the original, x86-only context (with 32 and 64 bit code 
generation is mostly shared), where the C calling convention is similar on 
x86_32, but differences only arose on x86_64, naming it "Win64" probably 
is quite neat, but within AArch64 it's a bit redundant - and if a similar 
distinction would be needed on ARM (e.g. if an explicit windows calling 
convention would be needed), reusing the existing "win64cc" is even more 
out of place...)

On Thu, 8 Oct 2020, Adrien Guinet via llvm-dev wrote:

> In one of other attempts to make all this mess easier to handle, we 
> adapted the https://github.com/shinh/maloader project (that will be open 
> source if all of this works) to load ARM64 MachO under Linux and run the 
> final binary using qemu-user. This can be seen as a very light version 
> of wine [1] for iOS.

> [3] What I say here isn't entirely true, as darlinghq moved away from 
> this "wine" model (which can be seen very basically as make a loader for 
> the targeted architecture, create wrappers for system libraries and run 
> all of this in userland). For those interested in more information, I 
> recommend reading the article in 
> http://blog.darlinghq.org/2017/02/the-mach-o-transition-darling-in-past-5.html

I would say this isn't entirely accurate regarding how wine works - maybe 
it was the case for other thinner win32 binary loaders that have existed 

Wine never (at least not in the last 20 years afaik) just translated calls 
between the windows and host environment. Wine consists of a mostly full 
reimplementation of all the supported Windows APIs, and these only 
occasionally call down to the host libc and host's native APIs. It's true 
that Wine used to build its modules as native ELF (or MachO) binaries - 
but they weren't just plain ELF .so's; internally they contain most of the 
PE DLL data structures as well, so that run and interact with other 
modules using the normal DLL import/export mechanisms.

But lately this has been taken even further, and now most modules can be 
built as real DLLs as well - linking against wine's msvcrt/ucrt instead 
of the host libc, etc. For higher level components that only interact with 
other DLLs, this is mostly straightforward, but for lower level components 
that actually do need to call the native host environment, they have been 
split into a native ELF/MachO component (which links against whatever 
system libraries it needs to use), and the bulk of the code as either a 
real DLL or as a DLL wrapped in ELF/MachO. This requires having a suitable 
cross compiler available (but with clang being multi-targeting, that 
should be trivially available).

So that sounds very much like the same approach that Darling is taking, 
except that Darling doesn't maintain support for building the emulated 
components as ELF, only as native MachO. And Darling has the benefit of 
being able to build Apple's open sourced code, instead of having to 
reimplement it all based on the public interfaces.

In any case - even if the bulk of the code is built as the emulated 
platform's native binaries (DLL or MachO), I guess there's a need for 
interaction at some layer (even if the interface might be quite thin), so 
having support for something like this sounds sensible to me.

And being able to interact with code built for a different ABI on a 
per-function level also sounds very sensible to me. So I don't think this 
is a bad idea.

BTW, for running Windows code on Linux, one constant stumbling block has 
been the use of the x18 register. On Linux, this register is normally free 
to use by any function, but on Windows, it is supposed to remain constant 
(pointing at a thread specific data structure), with various workarounds 
being used to retain it.

For the Darwin case, x18 is reserved (so compiler generated code doesn't 
use it, similar to windows), but AFAIK nothing really uses it. Earlier, 
the Darwin kernel used to overwrite the x18 register to 1 on context 
switch, just to make sure that no code kept relying on it retaining its 
value, but this doesn't seem to be the case any longer. As no code 
actually uses it, it shouldn't be any problem for your usecase.

> The current implementation & questions
> ======================================
> The current implementation introduces the CC_AArch64_Apple calling 
> convention, to enforce the usage of Apple's CC when necessary. This has 
> mainly been inspired by how CC_Win64 works.
> There are I think at least these limitations:
> * this supposes that the original targeted CC is Apple ARM64 AAPCS. In its current form,
> there is no way to support for instance vector calls (see for instance
> https://github.com/aguinet/llvm-project/commit/c4905ded3afb3182435df30e527955031cb0d098#diff-f124368bac3e5d7be20450aa83b166daR218)

I'm not familiar with the vector calling convention here - but if that's 
used, the function (on the C level) already has a suitable attribute 
specifying the non-standard calling convention? Wouldn't that end up 
lowered into the right thing here as well?

Or is it a case where there's a generic "vector" calling convention which 
turns into different things depending on whether targetin linux or darwin? 
In that case, you'd probably need add a separate attribute and calling 
conventions, like apple_vector and sysv_vector (or whatever to call the 
default), to allow specifying the intent more exactly.

For windows on i386, there's actually at least 4 different calling 
conventions being used; cdecl (the default for C code), stdcall, fastcall 
and vectorcall. As those names aren't associated with anything else on 
other platforms, you can use e.g. __attribute__((fastcall)) on any 

> My questions would be:
> * the fact that we can't target Apple's vector calls ABI shows that having one
> CC_AArch64Apple (as CC_Win64 exists) calling convention might not be the right
> implementation of this "apple_abi" attribute. Has someone better suggestions?

It doesn't sound too bad to me, but as naming things is one of the hardest 
things, one could also think of other, less generic names (as the 
attribute "apple_abi" or whatever it is, doesn't per se imply any specific 
ABI, but just is the apple default C calling convention) - but 
"apple_c_default" also is ugly.

> * For variadic functions (which are among the functions that have 
> different ABIs), GCC and Clang have __builtin_ms_va_list. My 
> understanding is that we should have the Apple equivalent, but I'm not 
> sure to completely understand what's at stake here. Said differently, is 
> this builtin used to make sure we use the va_list type of the Apple ABI, 
> should the need arise to forward it to another function that uses the 
> Apple ABI?

Exactly. In your example, you're implementing printf, so you're receiving 
variadic arguments on the stack, boiling them down to a (linux native) 
va_list and passing them to a linux native vprintf. If you'd be 
implementing and wrapping the darwin vprintf on the other hand, you'd need 
to declare it to be receiving a __builtin_apple_va_list.

> Example with printf
> ===================
> For now, we manage to compile this simple example for iOS/arm64:
> #include <stdio.h>
> int main(int argc, char** argv)
> {
>  printf("number of args: %d, argv: %s, %s, %s\n", argc, argv[0], argv[1], argv[2]);
>  return 0;
> }
> and run it under the combo maloader/qemu-user under Linux/x64, using this wrapper for printf:
> __attribute__((apple_abi)) int darwin_aarch64_printf(const char* format, ...)
> {
>  va_list args;
>  va_start(args, format);
>  const int ret = vprintf(format, args);
>  va_end(args);
>  return ret;
> }
> The fact that va_start/va_end works by using the Linux ABI from a 
> function whose arguments use the Apple ABI seems completely magical to 
> me, so if someone knows why this work I would also be interested!

I think this might be a borderline case that I wasn't entirely sure would 
work right, but apparently does. (Or maybe the code really is flexible 
enough to systematically handle such mixed cases?)

The calling convention attribute indicates how and where the variadic 
arguments are laid out on the stack, but these are then collected into a 
linux native va_list, which is passed to the linux native vprintf function 
that interprets them accordingly.

FWIW, if you want to experiment with how variadic functions and va_list 
behaves on different platforms, you can try e.g. this test snippet:

void vararg(int a, ...);
void call_vararg(void) {
         vararg(7, 8, 9, 10.0, 11, 12.0, 13);

void other(__builtin_va_list ap);
void receive_vararg(int a, ...) {
         __builtin_va_list ap;
         __builtin_va_start(ap, a);

int use_vararg(__builtin_va_list *ap) {
         return __builtin_va_arg(*ap, int);

Compiling this with e.g. "clang -target 
{aarch64-windows,aarch64-linux-gnu,arm64-apple-darwin} -S -O2 -o - test.c" 
lets you have a look at what they end up like. E.g. use_vararg is 
identical between darwin and windows, while call_vararg is kind of similar 
between linux and windows (except windows passes all variadic args in 
GPRs), and receive_vararg is pretty different between all of them.

> Is this a terrible idea?
> ========================
> Building these "ABI wrappers" using an "apple_abi" attribute seemed a 
> good idea at the beginning, but this already raises some concerns (see 
> above), and I'd be willing to hear any arguments that show that this is 
> actually a bad idea.

It's certainly more sustainable and durable to provide full, proper 
implementations of the target, like Darling and Wine do, but even then, 
being able to build a function taking arguments with a foreign calling 
convention does sound sensible and useful to me.

Depending on exactly where you draw the line between "emulated"/foreign 
executables and native host system, you might not have any variadic 
functions in the border interface layer, and then you might get away 
without such support in the compiler, but to me, it sounds like a useful 
thing to have in any case.

// Martin

More information about the llvm-dev mailing list