[PATCH] D144083: [JITLink] Initial 32-bit ARM backend

Thu Feb 23 14:04:02 PST 2023

sgraenitz added a comment.

The revised patch makes a lot of changes (sorry for that), but I think it's for the best. Due to its multitude of such architectures, I think this target has the potential to grow very big. It would be great to start off with a design that supports it. Let me try and summarize the major changes.

Introduced `readAddend()` analog to `applyFixup()`. It decodes the initial addend from a REL relocation from it's fixup location and gets called from `addSingleRelRelocation()`. There should be no functional difference to RELA relocations.

Out-lined actual implementations from `readAddend()` and `applyFixup()` (per relocation class Data/Arm/Thumb) into `aarch32.cpp`. This allows to move a lot of the supporting machinery into the cpp as well.  I am aware of the JITLink convention to have everything in this header in order to inline it, but I don't think it scales enough in this case. Where inlining is a must-have, please consider using LTO/Thin-LTO.

Introduced symmetric encode/decode functions for sets of fixup formats. AArch32 fixups involve complex combinations of shifts and masking. Having this symmetric and side-by-side hopefully aids comprehension and bugfixing. E.g. `encodeImmBT4BlT1BlxT2_J1J2(int64_t Value)` is used for `Thumb_Call` and `Thumb_Jump24`. It returns the bits that represent the immediate value for instruction formats B-T4, BL-T1 and BLX-T2 when J1 <https://reviews.llvm.org/J1>/J2 <https://reviews.llvm.org/J2> branch range extensions are available. It encodes the plain operation and doesn't check for errors (responsibility of the caller). The corresponding `decode` function is supposed to do the exact inverse operation.

In order to use these encode/decode functions, we need to mask out preexisting bits from the immediate field(s). That's the job of the freestanding `writeImmediate()` function. It infers the respective mask from a `FixupInfo<EdgeKind>` struct that collects all constants for a particular relocation type. `writeRegister()`, `checkRegister()` and `checkOpcode()` follow the same approach.

LLD doesn't do this effort. It mostly just overwrites the entire instruction (including opcode, register field, etc.) with default bits. I think for JITLink it makes a lot of sense to have these tools, because we want to be able to implement relaxation and (who knows?) re-optimization at some point. Another benefit is testing. I added `AArch32Relocations.Thumb_Call_J1J2` as a first example in JITLinkTests.

BTW Thanks to @kristof.beyls for your convincing arguments for renaming this to AArch32!

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h:473
+    //
+    assert(((Offset & ~0x1) + Size) <= Base.getSize() &&
            "Symbol extends past end of block");
----------------
lhames wrote:
> sgraenitz wrote:
> > @lhames Ideas how to get this cleaner?
> I wonder if any other architectures use the low bits of addresses for architecture specific flags? Maybe we could come up with something more general, like a triple-specific 'getRawAddressForArithmetic` function that we could use in common code.
> 
> At a minimum we could make the test in this assert conditional on the graph having a thumb triple -- that way we wouldn't affect other platforms.
This is not addressed yet. Will do next!

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/arm.h:33
+  /// Plain 32-bit pointer relocation; optional addend is stored in-place
+  Arm_Delta32 = FirstArmRelocation,
+
----------------
peter.smith wrote:
> lhames wrote:
> > peter.smith wrote:
> > > Strictly speaking that would be a Data relocation (https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst#5613static-data-relocations) rather than an Arm state relocation.
> > > 
> > > Not sure if you are planning Arm state instruction support, may be worth making the distinction if you are.
> > Are you suggesting a different prefix for the name?
> > 
> > Other architectures have all stuck to `Delta<bitwidth>`, so unless there's some reason to distinguish this as "arm" I think it would make sense to rename to `Delta32`.
> Not the Delta name, the FirstArmRelocation name. For example FirstDataRelocation. I don't think it is hugely important though.
Good point. In my new patch the groups mimic relocation Classes!

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/arm.h:59
+
+/// Implementation of ARM branch range extension stubs ("veneers") vary with
+/// ISA kinds (arm/thumb/interworking), CPU architectures (v4, v6, v7) and
----------------
peter.smith wrote:
> Not a strong opinion, but you may wish to use stubs throughout the implementation as I think JITLink seems to use stubs.
> 
> Veneers, Stubs and Thunks are all synonyms and although the Arm ABI calls them veneers GNU ld and LLD stick to the linker's choice.
In my new patch I turn around the naming approach: For names in actual code, I stick to the linker conventions as you propose ("stubs" in this case) and in comments I note the ARM ABI name ("veneers" in this case). That's the better fit indeed.

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/arm.h:97-105
+inline int64_t getPCBias(Edge::Kind K) {
+  switch (K) {
+  case Thumb_Call:
+  case Thumb_Jump24:
+    return 4;
+  default:
+    return 8;
----------------
lhames wrote:
> I think we should spell out the relocations with bias `8` and have the default case be an `llvm_unreachable`, just to be safe.
This was copy/paste and disappeared in the new patch.

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/arm.h:108
+/// Returns extracted bits Val[Hi:Lo].
+inline uint32_t extractBits(uint32_t Val, unsigned Hi, unsigned Lo) {
+  return (Val & (((1UL << (Hi + 1)) - 1))) >> Lo;
----------------
peter.smith wrote:
> IIUC this might be undefined behaviour for a 32-bit type if Hi is 31 as we'll end up with (1UL << 32) -1. Did you mean to make Val uint64_t?
Oh clearly this should have been `1ULL`! Good catch. In my new patch I moved away from the `extractBits()` helper and stick with plain shifts and masking. It really aids symmetry in encode/decode!

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/arm.h:129
+      return makeTargetOutOfRangeError(G, B, E);
+    *(little32_t *)FixupPtr = Value;
+    break;
----------------
peter.smith wrote:
> I guess we are only supporting little-endian for now (I don't blame you!) while for v7/v8 all instructions will always be little-endian the data can be big-endian on a big-endian system.
Yes, this didn't consider big endian.

> for v7/v8 all instructions will always be little-endian the data can be big-endian on a big-endian system

Thanks that's good to know. My new patch accounts for it in theory, but I have no device to test it in practice.

@peter.smith How is this different for pre-v7 CPUs? Thinking about v6m in particular? (Are there even big-endian variants?)

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/arm.h:140
+  case Thumb_Call: {
+    int64_t Addend = -getPCBias(E.getKind());
+    int64_t Value = TargetAddress - FixupAddress + Addend;
----------------
peter.smith wrote:
> It looks like this currently only handles Thumb destinations, i.e. it doesn't write a BLX instruction to Arm state if the TargetAddress has the bottom bit set.
> 
> Is this because all Arm/Thumb state transitions are veneered so we never get an opportunity to use BLX?
Yes, this did only implement BL-T1 to call Thumb code and assumed the LSB to be set. It didn't check whether LSB is clear and switch the instruction to BLX-T2 in order to call ARM code. Let me follow up on interworking veneers later.

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/arm.h:211
+
+  static StringRef getSectionName() { return "Veneer$$Code"; }
+
----------------
peter.smith wrote:
> The convention for the other targets seems to be "S__STUBS" I know armlink Arm's proprietary linker uses Veneer$$Code, but may be worth keeping in sync with other JITLink targets.
I was a bit proud that I managed to reverse engineer that :) but you are right.

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/arm.h:218
+    switch (E.getKind()) {
+    case arm::Thumb_Call:
+    case arm::Thumb_Jump24: {
----------------
peter.smith wrote:
> Is this indirecting all jumps and calls through a veneer? One thing a static linker needs to be careful about is functions that do not have type STT_FUNC (https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst#553symbol-values). In this case the bottom bit is alway expected to be 0, even for Thumb Targets. The current veneer would attempt to switch to Arm state for a relocation with bit 0 clear.
> 
> Non STT_FUNC symbols are expected to represent labels inside a function so they are always the same state.
> 
> It maybe that the JITLink doesn't encounter non STT_FUNC symbols at link time as it should be possible to resolve branches to internal labels at fixup time without having to expose them to the linker.
One more thing I will have to take a closer look later on. Not sure we are even adding such symbols to the link-graph yet :)

> Non STT_FUNC symbols are expected to represent labels inside a function so they are always the same state.

Thanks for this note! It didn't make sense to me until now.

> It maybe that the JITLink doesn't encounter non STT_FUNC symbols at link time as it should be possible to resolve branches to internal labels at fixup time without having to expose them to the linker.

Ok, I will try and keep that in mind.

> Is this indirecting all jumps and calls through a veneer?

Yes, the convention in JITLink is to first build stubs for all edges and allow to relax them later on, e.g. in an optimization pass.

================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/arm.h:238
+                 uint64_t Alignment) {
+    ArrayRef<char> Template(reinterpret_cast<const char *>(Code), Size);
+    return G.createContentBlock(getStubsSection(G), Template,
----------------
lhames wrote:
> As an aside to this review -- I find myself doing this a lot too. I wonder if we should add some `ArrayRef<char> <-> ArrayRef<uint8_t>` conversion helpers in ADT?
> 
> I've also gone back and forward on whether block content should be `ArrayRef<char>` or `ArrayRef<uint8_t>`. From memory they were originally `uint8_t`-based and I moved them to `char` at some point in the hope of reducing friction, but I'm not sure it was very successful in the end. Neither type seems to be a perfect fit.
I'd be in favor of `uint8_t` for all binary memory, because it avoids this ancient weird thing that signedness of `char` in C is undefined and every platform/compiler has their own interpretation what that means. The good thing is that it's always in-line with their string literals and that's what I think `char` is best used for (exclusively).

Hex bytes and `char` always make trouble at some point for some people, like this: https://github.com/llvm/llvm-project/commit/417b1164c28ef526cfe3ccab70d22598b7c63624

I must have missed the `uint8_t` era, JITLink used `char` as long as I can remember. I'd vote for changing it, but it's probably a lot of work and may cause a long tail of tricky issues. Introducing better handling only supports the status quo. Not sure that's ideal.

For now my workaround is to initialize with `uint8_t` and `reinterpret_cast<const char *>`, yes.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144083/new/

https://reviews.llvm.org/D144083