[PATCH] D98571: [lld-macho] Optimize getRelocAttrs()

Fri Mar 12 18:05:55 PST 2021

int3 created this revision.
int3 added a reviewer: lld-macho.
Herald added a project: lld-macho.
int3 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Investigation of PR49480 showed that D95121 <https://reviews.llvm.org/D95121> caused about a 5.0% speed
regression when linking chromium_framework. That diff introduces a (very
useful) additional layer of abstraction over relocations, so the perf
overhead is not too surprising. The diff is pretty large, and `perf`
didn't give me any great hints, so I just went with optimizing the
likely candidate -- `getRelocAttrs()`. I managed to claw back about
1.4% of perf this way. Making the `relocAttrsArray` a global and

  devirtualizing `getRelocAttrs()` gave most of the win; I also marked

the array range check with LLVM_UNLIKELY for good measure.

The numbers above are quoted for chromium_framework (from the tarball in
PR48657).

      N           Min           Max        Median           Avg        Stddev
  x  20           4.5          4.66          4.56        4.5715   0.044871161
  +  20          4.42          4.61           4.5        4.5075   0.053001986
  Difference at 95.0% confidence
          -0.064 +/- 0.0314295
          -1.39998% +/- 0.68751%
          (Student's t, pooled s = 0.0491052)

I also measured v8_unittests:

      N           Min           Max        Median           Avg        Stddev
  x  20          0.62          0.65          0.64        0.6355  0.0082557795
  +  20          0.61          0.63          0.62         0.616  0.0059824304
  Difference at 95.0% confidence
          -0.0195 +/- 0.00461426
          -3.06845% +/- 0.726084%
          (Student's t, pooled s = 0.00720928)

The v8 difference is likely larger because it doesn't use an order file.
Symbol ordering is actually one of the most expensive steps when linking
chromium_framework, and probably is a target for further optimization.

The other hotspot is the assignment of relocations to subsections. I'm
curious as to whether replacing the RB-tree in std::map with a radix
trie would be an improvement...

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D98571

Files:
  lld/MachO/Arch/ARM64.cpp
  lld/MachO/Arch/X86_64.cpp
  lld/MachO/InputFiles.cpp
  lld/MachO/InputSection.cpp
  lld/MachO/Relocations.cpp
  lld/MachO/Target.cpp
  lld/MachO/Target.h
  lld/MachO/UnwindInfoSection.cpp
  lld/MachO/Writer.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D98571.330411.patch
Type: text/x-patch
Size: 13637 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210313/c7b404cc/attachment.bin>