[Mlir-commits] [llvm] [mlir] [Support] Faster line and column lookup in SourceMgr (PR #195881)

llvmlistbot at llvm.org llvmlistbot at llvm.org
Tue May 5 15:58:11 PDT 2026


Albert =?utf-8?q?Havliček?= <havlialb at fit.cvut.cz>,
Albert =?utf-8?q?Havliček?= <havlialb at fit.cvut.cz>,
Albert =?utf-8?q?Havliček?= <havlialb at fit.cvut.cz>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/195881 at github.com>


llvmorg-github-actions[bot] wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-mlir-core

Author: Bertik23 (Bertik23)

<details>
<summary>Changes</summary>

Previously line and column lookup worked by finding the line number by utilizing a cache of offsets of line ends. And finding the column number by linear search of a previous newline. This is quite slow and caused some issues after merging #<!-- -->174566 and was fixed by wrapping all calls of the slow function in a `if (ParserContext)` in #<!-- -->180068. This is not ideal, since when the parser context would be supplied the parsing of files with long lines will take ages.

This PR implements a fix to the problem by utilizing the computed offsets for line ends to calculate the position on the current line. Thus improving the asymptotic complexity of the getLineAndColumn method from O(log l + c) to O(log l)  (l is number of lines, c is column position of the pointer). While not adding any overhead.

The second commit in this PR just updates two comments that I thought were misleading.

Since the getLineNumber methods are now unused I removed them in the third commit. And that caused MLIR to not compile, so I fixed it, while removing TODOs, that are implemented by this PR.

---
Full diff: https://github.com/llvm/llvm-project/pull/195881.diff


4 Files Affected:

- (modified) llvm/include/llvm/Support/SourceMgr.h (+11-6) 
- (modified) llvm/lib/Support/SourceMgr.cpp (+21-18) 
- (modified) mlir/lib/AsmParser/Lexer.cpp (+1-6) 
- (modified) mlir/lib/Tools/PDLL/CodeGen/MLIRGen.cpp (+1-6) 


``````````diff
diff --git a/llvm/include/llvm/Support/SourceMgr.h b/llvm/include/llvm/Support/SourceMgr.h
index 02e694cad8697..ec0b4edb295d4 100644
--- a/llvm/include/llvm/Support/SourceMgr.h
+++ b/llvm/include/llvm/Support/SourceMgr.h
@@ -65,11 +65,14 @@ class SourceMgr {
     /// dynamically based on the size of Buffer.
     mutable void *OffsetCache = nullptr;
 
-    /// Look up a given \p Ptr in the buffer, determining which line it came
-    /// from.
-    LLVM_ABI unsigned getLineNumber(const char *Ptr) const;
+    /// Look up a given \p Ptr in the buffer, determining which line and column
+    /// it came from. This method has O(log n) complexity, where n is the number
+    /// of lines in the buffer.
+    LLVM_ABI std::pair<unsigned, unsigned>
+    getLineAndColumn(const char *Ptr) const;
     template <typename T>
-    unsigned getLineNumberSpecialized(const char *Ptr) const;
+    std::pair<unsigned, unsigned>
+    getLineAndColumnSpecialized(const char *Ptr) const;
 
     /// Return a pointer to the first character of the specified line number or
     /// null if the line number is invalid.
@@ -208,13 +211,15 @@ class SourceMgr {
   LLVM_ABI unsigned FindBufferContainingLoc(SMLoc Loc) const;
 
   /// Find the line number for the specified location in the specified file.
-  /// This is not a fast method.
+  /// This method has O(log n) complexity, where n is the number of lines in the
+  /// buffer.
   unsigned FindLineNumber(SMLoc Loc, unsigned BufferID = 0) const {
     return getLineAndColumn(Loc, BufferID).first;
   }
 
   /// Find the line and column number for the specified location in the
-  /// specified file. This is not a fast method.
+  /// specified file. This method has O(log n) complexity, where n is the number
+  /// of lines in the
   LLVM_ABI std::pair<unsigned, unsigned>
   getLineAndColumn(SMLoc Loc, unsigned BufferID = 0) const;
 
diff --git a/llvm/lib/Support/SourceMgr.cpp b/llvm/lib/Support/SourceMgr.cpp
index 299615a6c8041..2704025060e03 100644
--- a/llvm/lib/Support/SourceMgr.cpp
+++ b/llvm/lib/Support/SourceMgr.cpp
@@ -121,7 +121,8 @@ static std::vector<T> &GetOrCreateOffsetCache(void *&OffsetCache,
 }
 
 template <typename T>
-unsigned SourceMgr::SrcBuffer::getLineNumberSpecialized(const char *Ptr) const {
+std::pair<unsigned, unsigned>
+SourceMgr::SrcBuffer::getLineAndColumnSpecialized(const char *Ptr) const {
   std::vector<T> &Offsets =
       GetOrCreateOffsetCache<T>(OffsetCache, Buffer.get());
 
@@ -134,21 +135,28 @@ unsigned SourceMgr::SrcBuffer::getLineNumberSpecialized(const char *Ptr) const {
 
   // llvm::lower_bound gives the number of EOL before PtrOffset. Add 1 to get
   // the line number.
-  return llvm::lower_bound(Offsets, PtrOffset) - Offsets.begin() + 1;
+  auto I = llvm::lower_bound(Offsets, PtrOffset);
+  unsigned LineNo = I - Offsets.begin() + 1;
+
+  // The column number is the distance from the previous EOL (or the start of
+  // the buffer) to the pointer.
+  T LineStartOffs = (I == Offsets.begin()) ? 0 : (I[-1] + 1);
+  unsigned ColNo = PtrOffset - LineStartOffs + 1;
+  return {LineNo, ColNo};
 }
 
-/// Look up a given \p Ptr in the buffer, determining which line it came
-/// from.
-unsigned SourceMgr::SrcBuffer::getLineNumber(const char *Ptr) const {
+/// Look up a given \p Ptr in the buffer, determining which line and column
+/// it came from.
+LLVM_ABI std::pair<unsigned, unsigned>
+SourceMgr::SrcBuffer::getLineAndColumn(const char *Ptr) const {
   size_t Sz = Buffer->getBufferSize();
   if (Sz <= std::numeric_limits<uint8_t>::max())
-    return getLineNumberSpecialized<uint8_t>(Ptr);
-  else if (Sz <= std::numeric_limits<uint16_t>::max())
-    return getLineNumberSpecialized<uint16_t>(Ptr);
-  else if (Sz <= std::numeric_limits<uint32_t>::max())
-    return getLineNumberSpecialized<uint32_t>(Ptr);
-  else
-    return getLineNumberSpecialized<uint64_t>(Ptr);
+    return getLineAndColumnSpecialized<uint8_t>(Ptr);
+  if (Sz <= std::numeric_limits<uint16_t>::max())
+    return getLineAndColumnSpecialized<uint16_t>(Ptr);
+  if (Sz <= std::numeric_limits<uint32_t>::max())
+    return getLineAndColumnSpecialized<uint32_t>(Ptr);
+  return getLineAndColumnSpecialized<uint64_t>(Ptr);
 }
 
 template <typename T>
@@ -217,12 +225,7 @@ SourceMgr::getLineAndColumn(SMLoc Loc, unsigned BufferID) const {
   auto &SB = getBufferInfo(BufferID);
   const char *Ptr = Loc.getPointer();
 
-  unsigned LineNo = SB.getLineNumber(Ptr);
-  const char *BufStart = SB.Buffer->getBufferStart();
-  size_t NewlineOffs = StringRef(BufStart, Ptr - BufStart).find_last_of("\n\r");
-  if (NewlineOffs == StringRef::npos)
-    NewlineOffs = ~(size_t)0;
-  return {LineNo, Ptr - BufStart - NewlineOffs};
+  return SB.getLineAndColumn(Ptr);
 }
 
 // FIXME: Note that the formatting of source locations is spread between
diff --git a/mlir/lib/AsmParser/Lexer.cpp b/mlir/lib/AsmParser/Lexer.cpp
index 8f53529823e23..161dcb4ca27c0 100644
--- a/mlir/lib/AsmParser/Lexer.cpp
+++ b/mlir/lib/AsmParser/Lexer.cpp
@@ -63,12 +63,7 @@ Location Lexer::getEncodedSourceLocation(SMLoc loc) {
   auto &sourceMgr = getSourceMgr();
   unsigned mainFileID = sourceMgr.getMainFileID();
 
-  // TODO: Fix performance issues in SourceMgr::getLineAndColumn so that we can
-  //       use it here.
-  auto &bufferInfo = sourceMgr.getBufferInfo(mainFileID);
-  unsigned lineNo = bufferInfo.getLineNumber(loc.getPointer());
-  unsigned column =
-      (loc.getPointer() - bufferInfo.getPointerForLineNumber(lineNo)) + 1;
+  auto [lineNo, column] = sourceMgr.getLineAndColumn(loc);
   auto *buffer = sourceMgr.getMemoryBuffer(mainFileID);
 
   return FileLineColLoc::get(context, buffer->getBufferIdentifier(), lineNo,
diff --git a/mlir/lib/Tools/PDLL/CodeGen/MLIRGen.cpp b/mlir/lib/Tools/PDLL/CodeGen/MLIRGen.cpp
index 0677828b635d4..08c085b5f279c 100644
--- a/mlir/lib/Tools/PDLL/CodeGen/MLIRGen.cpp
+++ b/mlir/lib/Tools/PDLL/CodeGen/MLIRGen.cpp
@@ -147,12 +147,7 @@ OwningOpRef<ModuleOp> CodeGen::generate(const ast::Module &module) {
 Location CodeGen::genLoc(llvm::SMLoc loc) {
   unsigned fileID = sourceMgr.FindBufferContainingLoc(loc);
 
-  // TODO: Fix performance issues in SourceMgr::getLineAndColumn so that we can
-  //       use it here.
-  auto &bufferInfo = sourceMgr.getBufferInfo(fileID);
-  unsigned lineNo = bufferInfo.getLineNumber(loc.getPointer());
-  unsigned column =
-      (loc.getPointer() - bufferInfo.getPointerForLineNumber(lineNo)) + 1;
+  auto [lineNo, column] = sourceMgr.getLineAndColumn(loc);
   auto *buffer = sourceMgr.getMemoryBuffer(fileID);
 
   return FileLineColLoc::get(builder.getContext(),

``````````

</details>


https://github.com/llvm/llvm-project/pull/195881


More information about the Mlir-commits mailing list