[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

Tim Northover via llvm-dev llvm-dev at lists.llvm.org
Thu Jan 31 07:05:30 PST 2019


As you may have noticed, we released a 64b S4 chip that runs an ILP32
variant of the AArch64 ABI, and now we'd like to upstream that work.
I've pushed preliminary patches to
https://github.com/TNorthover/llvm-project/pull/1/commits (arm64_32
branch in that repo) to accompany this RFC. The changes divide fairly
neatly into three categories.

First, there's AArch64 ILP32 support, which should be fairly easy to
adapt to the ELF (or COFF) world and be generally useful. This
involved changing some generic code in ways I'll discuss below.

Then there's the specific ABI we chose, which isn't quite the same as
AAPCS since it was designed in conjunction with armv7k so that IR
could be compiled to be compatible with arm64_32. Since people do use
third-party compilers based on LLVM having it upstream is expected to
be a good thing.

Finally we have a few passes that translate the necessarily
platform-specific parts of armv7k IR to arm64_32. Things like NEON
intrinsic calls and workarounds for certain assumptions the Swift
compiler made about C++ parameter passing. These aren't quite so
obviously useful to everyone, but could serve as examples in future
and (since they're self-contained IR passes) are likely to be low
maintenance. However, we'd understand if the community doesn't want
this
burden anyway.

Most of the target-specific changes are fairly straightforward, but I
think I should explain the changes made to generic CodeGen.

AArch64 ILP32 Addressing Modes
==============================

There are two basic issues with how the current SDAG lowering
interacts with AArch64 addressing-modes in an ILP32 scenario, both
stemming from the fact that all AArch64 addressing modes do 64-bit
arithmetic (unlike amd64, which can be told to do 32-bit arithmetic).
For the non-experts, AArch64 allows calculations like these to appear
in loads and stores:

    [x0, x1] == (add x0, x1)
    [x0, w1, sxtw] == (add x0, (sext w1))
    [x0, w1, uxtw] == (add x0, (zext w1))
    [x0, w1, sxtw #3] == (add x0, (shl (sext w1), 3))
    Plus some more shift modes that are even less relevant here.

The second is particularly important for arm64_32 since it mirrors GEP
semantics.

The first issue is that nothing except an inbounds GEP can really make
use of the extended addressing-modes in general. Most obviously a
2s-complement add has different wrapping behaviour:

    (load (add (Wn=0xffffffff, Wm=1))) != ldr ..., [Xn, Wm, sxtw]

Adding nuw would help here, allowing us to use the "uxtw"
addressing-mode. But nsw doesn't correspondingly allow "sxtw" because
the AArch64 misbehaving overflow is at the 0xffffffff boundary, which
isn't special for nsw -- the counter-example above still applies.

Moreover, the vast majority of pointer offsets come from GEPs and they
don't map cleanly to either nsw or nuw semantics provided by the DAG.
Pointers are fundamentally unsigned objects, but the offsets are
signed; so you only get nuw when you can prove the offset is positive
(see visitGetElementPtr in SelectionDAGBuilder.cpp).

That leaves inbounds GEPs, which theoretically map very cleanly to the
addressing modes: we know there's no wrapping at any precision so we
don't have to extend everything, and GEP defines offsets to be signed
constants, so we can use sxtw.

The second major issue with using AArch64 addressing modes is that an
i32 in SDAG has undef rather than 0 bits [32,64) when in a 64-bit
register -- a trunc operation maps to EXTRACT_SUBREG (i.e. ignore high
bits) rather than UXTW (zero them). AArch64 addressing-modes do not
extend the base pointer, so they would frequently have to be preceeded
by a manual truncation of the base pointer. In our initial
implementation this contributed to a large code size penalty for
arm64_32.

This motivates two of the changes proposed to generic CodeGen.


CodeGenPrepare:
---------------

We teach CodeGenPrepare to sink GEPs as GEPs, and preserve the
inbounds marker. This is the only way they can possibly be exposed to
SDAG at the basic block level.

Pointers are still 64-bits, tricked ya!
---------------------------------------

The next question was how to expose these GEPs to the SDAG.

I first considered adding an ISD::GEP or an "inbounds" flag to
ISD::ADD. These would solve the first issue above, but not the second.

So the proposed solution is to allow pointers to have different
in-memory and in-DAG types, and specifically keep an i64 pointer in
the DAG on arm64_32. This immediately guarantees that (valid) pointers
will have their high bits zeroed, and just by creating the DAG we make
explicit the sign-extensions described by GEP semantics.

Addressing-modes can then be used with no change to the actual C++
code in AArch64 that selects them.

There are two possible disadvantages though. First, since pointers are
64-bits, they will consume 64-bit spill slots and potentially bloat
the stack. It's unclear how much of an issue that is in practice.

Second is the intrusiveness. On the plus side it's less intrusive than
ISD::GEP would be, but it still involves changes in some fairly
obscure bits of DAG -- often found when things broke rather than by
careful planning.

Details of the arm64_32 ABI
===========================

In outline the arm64_32 ABI is based on AAPCS, with the usual Darwin exceptions:

  * char is a signed type.
  * Anonymous varargs parameters go on the stack (occupying at least 4 bytes).
  * Small parameters are extended by the producer to 32-bits.

There are also a couple of arm64_32 specific changes.

Pointers
--------

Darwin has traditionally taken the view (at odds with AAPCS on
AArch64) that under-sized arguments should be extended by the caller
to the point at which they'll be useful (i.e. mostly i32).

We decided to apply this to pointers for arm64_32 on the grounds that
most uses  of pointers will be as 64-bit quantities. I'm still
wondering if that was the best  decision: It probably is slightly more
efficient,  but it's also not pretty and  didn't solve the issues I'd
naively hoped it would with memcpy and friends (turns out size_t still
exists!).

Thus, pointers behave differently from intptr_t, and call lowering
code needs to know when it's dealing with one.

Arrays
------

We're translating armv7k bitcode to arm64_32, and the result has to be
compatible with code that is compiled directly to arm64_32.

The biggest barrier here was small structs. They generally get passed
in registers, possibly with alignment requirements.

    struct { int arr[2] }; goes in [rN,rN+1] or in xN.
    struct { uint64_t val; } goes in [rN,rN+1] (starting even), or xN

So we need a way to signal in IR that two values should be combined
into a single x-register when compiled for arm64_32. We chose LLVM
arrays for the job. So, unlike all other targets, the following two
functions will behave differently in arm64_32:

    void @foo([2 x i32] %x0)      ; Two i32s combined into 64-bit x0 register
    void @foo(i32 %w0, i32 %w1)   ; First i32 in w0, second in w1

Details of patch sequence
=========================

Here's a brief outline of the patches in the link:

1. CodeGenPrep: sink GEPs as GEPs and preserve inbounds note. As
discussed above, this is a necessary generic change to get good
CodeGen.
2. AArch64: support binutils-like things on arm64_32. Basic Triple,
llvm-objdump support, and other low-level tools that need to
understand the binary format.
3-5. Perparatory changes to generic SelectionDAG to support arm64_32.
The biggest of these is splitting pointer representations in-memory
from those in-register.
6. Main patch adding CodeGen support for arm64_32 to lib/Target/AArch64.
7. The armv7k compatibility passes mentioned above. One of them
replaces ARM intrinsic calls with AARch64 ones (NEON in particular),
one works around an unwarranted assumption in Swift, and one fixes up
an ObjC marker at the module level.
8. Clang support for arm64_32. The usual mix of ABI definitions.
9-15. Various components of FastISel suppor for arm64_32, gradually
bringing it to parity with arm64.
16. compiler-rt support for arm64_32. This includes both builtins and
sanitizers.


More information about the llvm-dev mailing list