[lld] e387778 - [ELF] Optimize ScriptLexer::getLineNumber by caching the previous line number and offset

Fangrui Song via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 22 15:35:29 PDT 2021


Author: Colin Cross
Date: 2021-06-22T15:35:24-07:00
New Revision: e387778722f93705db903aa755529568a05dd9db

URL: https://github.com/llvm/llvm-project/commit/e387778722f93705db903aa755529568a05dd9db
DIFF: https://github.com/llvm/llvm-project/commit/e387778722f93705db903aa755529568a05dd9db.diff

LOG: [ELF] Optimize ScriptLexer::getLineNumber by caching the previous line number and offset

getLineNumber() was counting the number of line feeds from the start of
the buffer to the current token. For large linker scripts this became a
performance bottleneck. For one 4MB linker script over 4 minutes was
spent in getLineNumber's StringRef::count.

Store the line number from the last token, and only count the additional
line feeds since the last token.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104137

Added: 
    

Modified: 
    lld/ELF/ScriptLexer.cpp
    lld/ELF/ScriptLexer.h

Removed: 
    


################################################################################
diff  --git a/lld/ELF/ScriptLexer.cpp b/lld/ELF/ScriptLexer.cpp
index 4b16974dd1346..236a188324cd6 100644
--- a/lld/ELF/ScriptLexer.cpp
+++ b/lld/ELF/ScriptLexer.cpp
@@ -56,7 +56,25 @@ size_t ScriptLexer::getLineNumber() {
     return 1;
   StringRef s = getCurrentMB().getBuffer();
   StringRef tok = tokens[pos - 1];
-  return s.substr(0, tok.data() - s.data()).count('\n') + 1;
+  const size_t tokOffset = tok.data() - s.data();
+
+  // For the first token, or when going backwards, start from the beginning of
+  // the buffer. If this token is after the previous token, start from the
+  // previous token.
+  size_t line = 1;
+  size_t start = 0;
+  if (lastLineNumberOffset > 0 && tokOffset >= lastLineNumberOffset) {
+    start = lastLineNumberOffset;
+    line = lastLineNumber;
+  }
+
+  line += s.substr(start, tokOffset - start).count('\n');
+
+  // Store the line number of this token for reuse.
+  lastLineNumberOffset = tokOffset;
+  lastLineNumber = line;
+
+  return line;
 }
 
 // Returns 0-based column number of the current token.

diff  --git a/lld/ELF/ScriptLexer.h b/lld/ELF/ScriptLexer.h
index 526268e3f65b6..405fc735cbe66 100644
--- a/lld/ELF/ScriptLexer.h
+++ b/lld/ELF/ScriptLexer.h
@@ -40,6 +40,9 @@ class ScriptLexer {
   bool inExpr = false;
   size_t pos = 0;
 
+  size_t lastLineNumber = 0;
+  size_t lastLineNumberOffset = 0;
+
 protected:
   MemoryBufferRef getCurrentMB();
 


        


More information about the llvm-commits mailing list