[lld] [lld-macho] Fix branch extension thunk estimation logic (PR #120529)

Tue Jan 7 15:15:23 PST 2025

================
@@ -184,15 +184,45 @@ uint64_t TextOutputSection::estimateStubsInRangeVA(size_t callIdx) const {
     InputSection *isec = inputs[i];
     isecEnd = alignToPowerOf2(isecEnd, isec->align) + isec->getSize();
   }
+
+  // Tally up any thunks that have already been placed that have VA higher than
+  // inputs[callIdx]. First, find the index of the first thunk that is beyond
+  // the current inputs[callIdx].
+  auto itPostcallIdxThunks =
+      llvm::partition_point(thunks, [isecVA](const ConcatInputSection *t) {
+        return t->getVA() <= isecVA;
+      });
+  uint64_t existingForwardThunks = thunks.end() - itPostcallIdxThunks;
+
----------------
drodriguez wrote:

I did not have an intuition for the total number of entries in `thunkMap`. If we are talking about 100k or those numbers, doing only  ~16 comparisons is probably going to always be quicker.

My comment about double counting is because of the name `maxPotentialThunks` and the way `thunkMap` seems to be copied progressively into `thunks`, but never cleaned up. I don't pretend to understand the flow of the code (it is difficult to figure out what's going on and how things will happen during runtime), but the name `maxPotentialThunks` might not be the best if it was undercounting (originally). IIUC the code calls `estimateStubsInRangeVA` only once during `finalize` (which I think only happens once). At the point `estimateStubsInRangeVA` is called, part of `thunkMap` has already been pushed into `thunks`. The more we overcount, the higher `stubsInRangeVA` will be, and it might be that we are creating thunks for the end of the text section when a simple jump will had be enough. I don't know how bad is that size increase in the final binary when we are overcounting. It seems to me that we were undercounting when `ti.callSitesUsed == ti.callSiteCount` (I imagine `callSitesUsed` cannot be higher than `callSiteCount`). It might be possible to have a better estimation during the `thunksMap` processing, for example changing the `if (ti.isec->getVA() <= isecVA)` I was proposing above to `else if (ti.isec->getVA() <= isecVA)` (which I think was when the original code was undercounting).

Again, it is complicated code, I probably don't understand it correctly, and there's a couple of things that might make overcounting not as important of not undercounting. Feel free to ignore my feedback.


https://github.com/llvm/llvm-project/pull/120529