[PATCH] D104137: Optimize lld::elf::ScriptLexer::getLineNumber by avoiding repeated work

Fri Jun 11 11:16:20 PDT 2021

ccross created this revision.
ccross added reviewers: srhines, pirama, MaskRay.
Herald added subscribers: arichardson, emaste.
ccross requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

getLineNumber() was counting the number of line feeds from the start
of the buffer to the current token.  For large linker scripts this
became a performance bottleneck.  For one 4MB linker script over 4
minutes was spent in getLineNumber's StringRef::count.

Store the line number from the last token, and only count the additional
line feeds since the last token.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D104137

Files:
  lld/ELF/ScriptLexer.cpp
  lld/ELF/ScriptLexer.h


Index: lld/ELF/ScriptLexer.h
===================================================================

--- lld/ELF/ScriptLexer.h
+++ lld/ELF/ScriptLexer.h
@@ -40,6 +40,9 @@
   bool inExpr = false;
   size_t pos = 0;
 
+  size_t lastLineNumber = 0;
+  size_t lastLineNumberOffset = 0;
+
 protected:
   MemoryBufferRef getCurrentMB();
 
Index: lld/ELF/ScriptLexer.cpp
===================================================================
--- lld/ELF/ScriptLexer.cpp
+++ lld/ELF/ScriptLexer.cpp
@@ -56,7 +56,28 @@
     return 1;
   StringRef s = getCurrentMB().getBuffer();
   StringRef tok = tokens[pos - 1];
-  return s.substr(0, tok.data() - s.data()).count('\n') + 1;
+
+  // For the first token, or when going backwards, start from the beginning of
+  // the buffer.
+  size_t line = 1;
+  size_t start = 0;
+
+  const size_t tokOffset = tok.data() - s.data();
+
+  // If this token is after the previous token start from the previous token.
+  if (lastLineNumberOffset > 0 && tokOffset >= lastLineNumberOffset) {
+    start = lastLineNumberOffset;
+    line = lastLineNumber;
+  }
+
+  // Add the number of linefeeds since the start of the region of interest.
+  line += s.substr(start, tokOffset - start).count('\n');
+
+  // Store the line number of this token for reuse.
+  lastLineNumberOffset = tokOffset;
+  lastLineNumber = line;
+
+  return line;
 }
 
 // Returns 0-based column number of the current token.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D104137.351508.patch
Type: text/x-patch
Size: 1411 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210611/a4a20f77/attachment.bin>