[PATCH] D106128: [lld-macho] Use immediate encodings for bind opcodes

Sat Jul 17 00:46:52 PDT 2021

int3 added inline comments.

================
Comment at: lld/MachO/SyntheticSections.cpp:366-367
+  // Pass 3: Use immediate encodings
+  size_t offsetWidth =
+      target->wordSize == 8 ? sizeof(uint64_t) : sizeof(uint32_t);
+  for (BindIR *p = &opcodes[0]; p->opcode != BIND_OPCODE_DONE; ++p) {
----------------
the ternary is unnecessary, `offsetWidth` is just `target->wordSize` :)

also, I think `offsetWidth` is kind of a misleading name... `pointerSize` is probably more apt. Or we could just use `target->wordSize` directly.

I think I understand the motivation behind this opcode design: Since every binding is the size of one pointer, the next binding must be at least `wordSize` away. Most likely it's some multiple of `wordSize` away (if there are multiple intervening pointers). Hence the scaling by pointer size. (Might be worth to write something like this as a comment)

================
Comment at: lld/MachO/SyntheticSections.cpp:370
+    if ((p->opcode == BIND_OPCODE_DO_BIND_ADD_ADDR_ULEB) &&
+        (p->data < (BIND_IMMEDIATE_MASK * offsetWidth)) &&
+        ((p->data % offsetWidth) == 0)) {
----------------
how about `p->data / offsetWidth <= BIND_IMMEDIATE_MASK`, to mirror the assignment below?

================
Comment at: lld/MachO/SyntheticSections.cpp:373
+      p->opcode = BIND_OPCODE_DO_BIND_ADD_ADDR_IMM_SCALED;
+      p->data = p->data / offsetWidth;
+    }
----------------

================
Comment at: lld/test/MachO/bind-opcodes.s:1
 # REQUIRES: x86
 # RUN: rm -rf %t; split-file %s %t
----------------
thevinster wrote:
> This file is pretty messy and hard to read. So I'll try to condense what I did here.
> 1/ In order to run both 64-bit tests and 32-bit tests in one file, I had to separate the suffix the input and output with the arch.
> 2/ Nothing changed with `llvm-objdump`. It was merely a forklift from the bottom to the top. There isn't a corresponding one for the arm64_32. I'm not exactly familiar with what it does, but I'm happy to add it if it provides value.
> 3/ In order to satisfy the linker, I had to switch the use of `quad` to `int` otherwise I get relocation errors. The order of the bind opcodes between 32-bit and 64-bit are the same (with slightly offsets)
> 4/ `CHECK` are in its own separate files now in order to isolate between different arch. 
I think we can make it less messy :)

1: yeah this makes sense
2: dumping the bind table is basically a sanity check: it decodes the bind opcodes into an easy-to-read form so we can verify that we encoded the right things. I think it's worth adding for 32-bit as well, but we can be clever and reuse the same code (see below)
3: I think we can reuse the same code. I haven't tested this but I think something like it should work:
```
.ifdef PTR64
.macro ptr val
  .quad val
.endm
.endif

.ifdef PTR32
.macro ptr val
  .int val
.endm
.endif

ptr _foo
ptr _bar
...
```
PTR64 and PTR32 will have to be defined as part of llvm-mc's invocation. See https://github.com/llvm/llvm-project/blob/main/llvm/test/DebugInfo/X86/dwarfdump-header-64.s for an example.
4: FileCheck takes a `--check-prefix` argument, so we can put the checks in the same file.

================
Comment at: lld/test/MachO/bind-opcodes.s:15-20
+# RUN: llvm-objdump --macho --bind %t/test-x86_64 | FileCheck %s --check-prefix=BIND
+# BIND:       Bind table:
+# BIND-NEXT:  segment  section   address      type       addend dylib     symbol
+# BIND-NEXT:  __DATA   __data    0x100001000  pointer         0 libfoo    _foo
+# BIND-NEXT:  __DATA   __data    0x100001010  pointer         0 libfoo    _foo
+# BIND-NEXT:  __DATA   __data    0x100001020  pointer         1 libfoo    _foo
----------------
something like this should allow reusing the check across both archs

(see 'Numeric Substitutions' in the FileCheck manual for details)

================
Comment at: lld/test/MachO/bind-opcodes.s:29
+# RUN: llvm-mc -filetype=obj -triple=arm64_32-apple-darwin %t/test-arm64_32.s -o %t/test-arm64_32.o
+# RUN: %lld-arm64_32 -O2 -dylib %t/foo.o -o %t/libfoo.dylib
+# RUN: %lld-arm64_32 -O2 -dylib %t/test-arm64_32.o %t/libfoo.dylib -o %t/libtest-arm-64_32.dylib
----------------
you can just do `%lld -arch arm64_32`, the later arch will override the earlier one. no need to define another substitution

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106128/new/

https://reviews.llvm.org/D106128