[PATCH] D100818: [lld-macho] Implement branch-range-extension thunks

Tue May 11 15:33:14 PDT 2021

gkm marked 26 inline comments as done.
gkm added inline comments.

================
Comment at: lld/MachO/MergedOutputSection.cpp:131
+  size_t maxPotentialThunks = 0;
+  for (ThunkMapPair &tp : thunkMap) {
+    ThunkInfo &ti = tp.second;
----------------
int3 wrote:
> hm so we are calling `estimateStubsInRangeVA` itself in a loop... is this potentially expensive? I guess `thunkMap` and `endIdx - callIdx` aren't typically large, so the nested loops are probably fine... but might be worth calling out explicitly in a comment
Although we call `estimateStubsInRangeVA()` within a loop, it is predicated by condition that is only true once. The call needs to happen within the loop in order to happen at the proper time: as soon as all input sections are finalized, i.e., when the end of __text is within forward-branch range of the current call site. I will add a comment to highlight that feature.

================
Comment at: lld/MachO/MergedOutputSection.cpp:145-151
+  log("thunks = " + std::to_string(thunkMap.size()) +
+      ", potential = " + std::to_string(maxPotentialThunks) +
+      ", stubs = " + std::to_string(in.stubs->getSize()) + ", isecVA = " +
+      to_hexString(isecVA) + ", threshold = " + to_hexString(stubsInRangeVA) +
+      ", isecEnd = " + to_hexString(isecEnd) +
+      ", tail = " + to_hexString(isecEnd - isecVA) +
+      ", slop = " + to_hexString(branchRange - (isecEnd - isecVA)));
----------------
int3 wrote:
> all these `to_hexString()` calls makes me wonder if we should have a `LOG` macro that doesn't actually evaluate its arguments till they are needed...
> 
> (I still think we should avoid adding `Config::verbose`, we can just extend ErrorHandler)
> 
> if this turns out to be necessary let's do it in a stacked diff... there's too much going on in this one already
This is called only once, so the overhead of a few `to_hexString()` calls is negligible, and not worthy of a `LOG` macro. I already removed `Config::verbose` and inject `OPT_verbose` into `errorHandler().verbose`.

================
Comment at: lld/MachO/MergedOutputSection.cpp:263
+        // unfinalized inputs[finalIdx].
+        fatal(Twine(__FUNCTION__) + ": FIXME: thunk range overrun");
+      }
----------------
int3 wrote:
> I think this should work
Sadly, it doesn't. Unlike old C compilers, `__FUNCTION__` is not expanded by the preprocessor into a string literal. It is a compiler-internal variable, and thus not amenable to concat via adjacent string literals. 

================
Comment at: lld/MachO/MergedOutputSection.h:80
+  uint8_t sequence = 0;        // how many thunks created so-far?
+  bool hasStub = false;        // does the real function have a stub?
+};
----------------
int3 wrote:
> is this different from `Symbol::isInStubs()`?
No. Removed. Thanx!

================
Comment at: lld/MachO/Target.h:93-95
+  // We contrive this value as sufficiently far from any valid address
+  // that it will always be out-of-range for any architecture.
+  static constexpr uint64_t outOfRangeVA = 0xfull << 60;
----------------
int3 wrote:
> This seems unnecessarily cute... Can we just use `numeric_limits<uint64_t>::max()`? Also it's not really target-specific so I'd prefer it not be put in Target
FYI, this wasn't intended to be cute at all. `numeric_limits<uint64_t>::max()` is not viable since it is the tombstone value for `DenseMap<>` and induces weird assertions. Another disqualifier is that `max()` is only one increment away from wrapping to 0. My chosen value `0xf000'0000'0000'0000` is VERY FAR away from 0. Perhaps MachO guarantees that `__text` addresses will never be within even 4 GiB of 0, so I am unnecessarily cautious?

Regarding target-specificity: it is OS/runtime specific (mostly Darwin & iOS), modulo CPU-arch variations. It so happens all are Apple creations, and common enough that we can choose a single constant that works for all. If not `Target.h`, where do you propose we define this? I don't see anywhere that seems a better fit ... `Config.h`? `OutputSection.h`? `Relocations.h`? What looks good to you?

================
Comment at: lld/test/MachO/arm64-thunks.s:2
+# REQUIRES: aarch64, shell
+# UNSUPPORTED: windows
+
----------------
int3 wrote:
> smeenai wrote:
> > Is the `REQUIRES: shell` sufficient by itself, or do you also need the explicit `UNSUPPORTED: windows`?
> It should be sufficient by itself. I believe they're largely the same thing as far as the buildbots are concerned, but `REQUIRES: shell` alone would be more semantically descriptive of what this test depends on in its current form
The saddest part is that the test //still// runs and fails on Windows.

================
Comment at: lld/test/MachO/arm64-thunks.s:12
+# RUN: rm -rf %t; split-file %s %t
+# RUN: for x in {a..i}; do \
+# RUN:   llvm-mc -filetype=obj -triple=arm64-apple-darwin %t/$x.s -o %t/$x.o; \
----------------
int3 wrote:
> int3 wrote:
> > looks like this is failing on Windows. Not sure there's a cross-platform way to loop, but writing 9 `llvm-mc` invocations doesn't seem too terrible either
> > 
> > out of curiosity: what happens if we put everything into the same file, instead of multiple files -- what does `llvm-mc` generate?
> ah I see you disabled the test on Windows... I don't think this is a good enough reason to disable stuff on Windows unfortunately (I got pushback for it in https://bugs.llvm.org/show_bug.cgi?id=49512)
With `.subsections_via_symbols`, it all works as a single input file. Thanx!

================
Comment at: lld/test/MachO/arm64-thunks.s:22
+
+CHECK: {{0*}}[[#%.6x, A_PAGE:]][[#%.3x, A_OFFSET:]] <_a>:
+CHECK:  bl 0x[[#%x, A:]] <_a>
----------------
int3 wrote:
> since you're using the `%.6x` syntax, the `{{0*}}` seems unnecessary... (you might have to change `.6` to `.8` though)
`.13`, since these are 64-bit values with 16 hex digits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100818/new/

https://reviews.llvm.org/D100818