[PATCH] D113153: [lld-macho] Cache readFile results

Keith Smiley via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 3 17:17:10 PDT 2021


keith created this revision.
keith added reviewers: int3, smeenai.
Herald added a reviewer: gkm.
Herald added a project: lld-macho.
Herald added a reviewer: lld-macho.
keith requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

In one of our links lld was reading 760k files, but the unique number of
files was only 1500. This takes that link from 30 seconds to 8.

This seems like a heavy hammer, especially since some things don't need
to be cached, like the filelist arguments and the passed static
archives (the latter is already cached as a one off), but it seems ld64
does something similar here to short circuit these duplicate reads:

https://github.com/keith/ld64/blob/82e429e186488529111b0ef86af33a3b1b9438c7/src/ld/InputFiles.cpp#L644-L665

Of the types of files being read for our iOS app, the biggest problem
was constantly re-reading small tbd files:

  % wc -l /tmp/read.txt
  761414 /tmp/read.txt
  % cat /tmp/read.txt | sort -u | wc -l
  1503
  
  % cat /tmp/read.txt | grep "\.a$" | wc -l
  43721
  % cat /tmp/read.txt | grep "\.tbd$" | wc -l
  717656

We could likely hoist this logic up to not cache at this level, but it
would be a more invasive change to make sure all callers that needed it
cached the results.

I could see this being an issue with OOMs, and I'm not a linker expert so
maybe there's another way we should solve this problem? Feedback welcome!


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D113153

Files:
  lld/MachO/InputFiles.cpp


Index: lld/MachO/InputFiles.cpp
===================================================================
--- lld/MachO/InputFiles.cpp
+++ lld/MachO/InputFiles.cpp
@@ -173,8 +173,14 @@
   return true;
 }
 
+static DenseMap<CachedHashStringRef, MemoryBufferRef> resolvedReads;
 // Open a given file path and return it as a memory-mapped file.
 Optional<MemoryBufferRef> macho::readFile(StringRef path) {
+  CachedHashStringRef key(path);
+  auto entry = resolvedReads.find(key);
+  if (entry != resolvedReads.end())
+    return entry->second;
+
   ErrorOr<std::unique_ptr<MemoryBuffer>> mbOrErr = MemoryBuffer::getFile(path);
   if (std::error_code ec = mbOrErr.getError()) {
     error("cannot open " + path + ": " + ec.message());
@@ -192,7 +198,7 @@
       read32be(&hdr->magic) != FAT_MAGIC) {
     if (tar)
       tar->append(relativeToRoot(path), mbref.getBuffer());
-    return mbref;
+    return resolvedReads[key] = mbref;
   }
 
   // Object files and archive files may be fat files, which contain multiple
@@ -217,7 +223,8 @@
       error(path + ": slice extends beyond end of file");
     if (tar)
       tar->append(relativeToRoot(path), mbref.getBuffer());
-    return MemoryBufferRef(StringRef(buf + offset, size), path.copy(bAlloc));
+    return resolvedReads[key] = MemoryBufferRef(StringRef(buf + offset, size),
+                                                path.copy(bAlloc));
   }
 
   error("unable to find matching architecture in " + path);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D113153.384617.patch
Type: text/x-patch
Size: 1460 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211104/39bc2dc8/attachment.bin>


More information about the llvm-commits mailing list