[llvm-dev] compler-rt, __aeabi_memcpy () possibly broken (ARM)

Mon Apr 11 12:01:41 PDT 2016

Renato,

thanks for your reply. However, I'm not sure whether I agree to your 
conclusions.

Let me try to show why:

>> The situation is slightly different for __aeabi_memcpy (). The ABI spec
>> explicitly states: "In general, implementations of these
>> functions are allowed to corrupt only the integer core registers permitted
>> to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)."
>
> This is standard AAPCS, I trully hope neither glibc, newlib or musl's
> implementations will corrupt anything more.

No, this is *not* standard AAPCS. "integer core registers" makes the 
difference here!
It is rather the base standard AAPCS in ARM's nomenclature.

A normal AAPCS compliant C-function (like, let's say memcpy) can also 
corrupt the
floating-point registers D0-D7, D16-D31.
(see: Procedure Call Standard for the ARM Architecture, section 5.1.2.1)
And that's exactly what some implementations of memcpy do.
These scratch floating-point registers just come in handy to transfer 32 
or 64 bytes at once!

__aeaby_memcpy () however does not have this freedom. It is bound by the 
"Run-time ABI
for the ARM Architecture". And this restricts it to corrupt core integer 
registers only.
It must preserve the contents of all floating-point registers. Therefor 
it can not simply
branch to an externally provided memcpy, without knowing how that is 
implemented.

That means it is safe to use __eabi_memcpy () for calls to memcpy (), 
but not the other
way around. (contrary to what you concluded)

That's why I think we have a problem here.

Now I'm gonna open a can of worms:

There are more instances of this restriction for other functions, like 
long long helper
functions, the one-time construction API and such.
In case of ldivmod and udivmod it is explicitly stated that these *have* 
full [AAPCS] privileges.

Functions written in assembly should be safe, but c functions are a risk.
compiler-rt seems to use __attribute__((pcs("aapcs"))) for functions 
that are implemented in C.
That still does not guarantee that no floating-point registers are 
corrupted!

Just compile this function and you will see what I mean:

   __attribute__((pcs("aapcs"))) double fn_aapcs (double a, double b) 
{return (a+b);}

There are quite some candidates that might violate the Run-Time ABI rules.

The only solution that I see is that LLVM must treat all eabi functions 
as normal functions.
It should assume floating-point scratch registers could be corrupted 
even by functions that
are defined not to do so.
And LLVM must never try to be smart and use float registers for no-float 
code, and be it
as handy scratch registers for inlined copy operations and such.

Even if compiler-rt did not have a problem, I have seen too many 
libraries implement these
functions in C. The one-time construction API is an example 
(__cxa_guard_acquire and friends).
These functions are typically written in C and call external functions. 
(mutexes, sleeping)
If any of these functions messes any float registers, then things are 
gonna be fucked up.

Hopefully LLVM simply treats eabi functions as normal C-functions 
regardless of the Run-time
ABI restrictions. And LLVM never touches floating-point registers - not 
even the scratch registers
unless float operations are performed.
Then things might work. Someone should mark the relevant code in LLVM, 
so that this is not
changed in the future - ever.

What really makes we sweat is that if there is a problem, then it will 
only bite under obscure
circumstances and will be very hard to identify.

Please, please tell me I'm wrong and overlooked something completely.

Peter

Am 11.04.2016 um 11:43 schrieb Renato Golin:
> On 10 April 2016 at 20:37, Peter Jakubek via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> I recognized that compiler-rt's the implementation of __aeabi_memcpy simply
>> branches to memcpy.
>> The implementation of memcpy is not provided. So an externally provided
>> memcpy () has to be used.
>> (also applies to memmove, memset, memclr)
>
> Hi Peter,
>
> In a nutshell, Compiler-RT may assume there is a C library underneath.
> This is the same for LibGCC
> (https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html):
>
>      "GCC will also generate calls to C library routines, such as
> memcpy and memset, in some cases."
>
> This also works on free-standing environments (ex. the Linux kernel)
> because those environments assume the compiler library will do so, and
> thus implement "memcpy", "memset", etc.
>
>
>> On ARM I have seen implementations of memcpy () using floating-point
>> registers (if compiled with NEON support). The is perfectly
>> legal, as memcpy () only needs to comply with the Procedure Call Standard.
>> According to this d0-d7, d16-d31 are scratch registers.
>
> RT *could* have some optimised versions of those functions, but as you
> can imagine, this is not a trivial pursuit. One would need to take
> into account all the variations, alignment, ISA support and
> micro-architecture differences, and that's a very large project, even
> if we just consider ARM targets.
>
> Given that most C libraries have had that consideration already, and
> have done more tests (conformance and performance) than we have, it's
> safe to assume that whatever the C libraries did is probably better
> (average case) than any "smart" thing we can conjure in a weekend.
>
>
>> The situation is slightly different for __aeabi_memcpy (). The ABI spec
>> explicitly states: "In general, implementations of these
>> functions are allowed to corrupt only the integer core registers permitted
>> to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)."
>
> This is standard AAPCS, I trully hope neither glibc, newlib or musl's
> implementations will corrupt anything more.
>
>
>> newlib addresses this by explicitly providing a separate implementation for
>> __aeabi_memcpy () and memcpy ().
>> But things can get messed up if compiler-rt's __aeabi_memcpy () is actually
>> used.
>
> Indeed, this is the real problem (not the FP clobbering, the two
> libc's versions). This will be a linker nightmare, and could break the
> programmers assumption by using a specific C library.
>
> Furthermore, EABI's implementation returns void, while the C standard
> version returns the pointer, so they're not completely replaceable for
> every case.
>
> So, it's safe to use memcpy for all calls to __eabi_memcpy (modulo
> AAPCS bugs, performance), but the other way around is not.
>
>
>> The implementation of __aeabi_memcpy () in ARM's compiler 5 tool-chain also
>> uses floating-point registers - but preserves the contents.
>> So this one is ABI compliant.
>
> Any function that is AAPCS compliant *must* preserve all but the AAPCS
> registers. If one doesn't, it's not Compiler-RT's fault to assume so.
>
>
>> My conclusion is that the current implementation of compiler-rt can
>> potentially introduce difficult to track down problems.
>> Unless of course LLVM always can handle corrupted floating-point registers
>> for all calls to __aeabi_memcpy ().
>>
>> Either the aeabi variants of memcpy, memmove, memset, memclr should be fixed
>> or the stubs should removed from compiler-rt.
>
> I agree. But it's not that simple.
>
> Some environments (ex. Linux kernel, Android, FreeBSD?, OSX?) depend
> on this behaviour, and as much as I agree with you that this is a
> broken assumption, we can't just remove it from RT.
>
> To make matters more complicated, different environments will update
> RT at different times, and any change we do will have impact for years
> to come, and will be hard to fix in the best way for everyone, quickly
> enough.
>
> I believe one quick way to "fix" the multiple-versions is to mark RT's
> version as weak (or equivalent), so that the C library's (or
> free-standing's) version always gets picked first. But it may be an
> over simplification from my part...
>
> I may also be underestimating the register clobbering problem, so if
> you could give me a concrete example where it's ok for the library to
> corrupt non-AAPCS registers, we could further discuss it.
>
> The only case I remember in GCC, was that it was spilling to VFP
> registers (before stack) when they existed in the platform, but this
> was a bad idea and was reverted promptly, because it was breaking
> stack unwinding, context-switching, etc.
>
> cheers,
> --renato
>