[all-commits] [llvm/llvm-project] d49e72: [lld-macho] Cache readFile results

Keith Smiley via All-commits all-commits at lists.llvm.org
Wed Nov 3 22:15:11 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: d49e7244cc017dc5616e7a1b1e50f458c333c1ae
      https://github.com/llvm/llvm-project/commit/d49e7244cc017dc5616e7a1b1e50f458c333c1ae
  Author: Keith Smiley <keithbsmiley at gmail.com>
  Date:   2021-11-03 (Wed, 03 Nov 2021)

  Changed paths:
    M lld/MachO/InputFiles.cpp

  Log Message:
  -----------
  [lld-macho] Cache readFile results

In one of our links lld was reading 760k files, but the unique number of
files was only 1500. This takes that link from 30 seconds to 8.

This seems like a heavy hammer, especially since some things don't need
to be cached, like the filelist arguments and the passed static
archives (the latter is already cached as a one off), but it seems ld64
does something similar here to short circuit these duplicate reads:

https://github.com/keith/ld64/blob/82e429e186488529111b0ef86af33a3b1b9438c7/src/ld/InputFiles.cpp#L644-L665

Of the types of files being read for our iOS app, the biggest problem
was constantly re-reading small tbd files:

```
% wc -l /tmp/read.txt
761414 /tmp/read.txt
% cat /tmp/read.txt | sort -u | wc -l
1503

% cat /tmp/read.txt | grep "\.a$" | wc -l
43721
% cat /tmp/read.txt | grep "\.tbd$" | wc -l
717656
```

We could likely hoist this logic up to not cache at this level, but it
would be a more invasive change to make sure all callers that needed it
cached the results.

I could see this being an issue with OOMs, and I'm not a linker expert so
maybe there's another way we should solve this problem? Feedback welcome!

Reviewed By: int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D113153




More information about the All-commits mailing list