[llvm] [SourceMgr] Clean up handling of line ending characters (PR #120605)

Jay Foad via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 7 04:09:58 PST 2025


================
@@ -91,19 +91,32 @@ static std::vector<T> &GetOrCreateOffsetCache(void *&OffsetCache,
   size_t Sz = Buffer->getBufferSize();
   assert(Sz <= std::numeric_limits<T>::max());
   StringRef S = Buffer->getBuffer();
-  for (size_t N = 0; N < Sz; ++N) {
-    if (S[N] == '\n')
-      Offsets->push_back(static_cast<T>(N));
+
+  // The cache always includes 0 (for the start of the first line) and Sz (so
+  // that you can always index by N+1 to find the end of line N, even if the
+  // last line has no terminating newline).
+  Offsets->push_back(0);
+  for (size_t N = 0; N != Sz;) {
+    while (N != Sz && S[N] != '\n' && S[N] != '\r')
+      ++N;
+    if (N == Sz)
+      break;
+
+    // Skip over CR, LF, CRLF or LFCR.
+    ++N;
+    if (N != Sz && (S[N - 1] ^ S[N]) == ('\r' ^ '\n'))
+      ++N;
+    Offsets->push_back(static_cast<T>(N));
   }
+  Offsets->push_back(static_cast<T>(Sz));
----------------
jayfoad wrote:

Both LLVM and MLIR have unit tests to check that if your file ends with LF, and you look up the line/column number for a pointer to the end of the buffer (just after the LF), then it is treated like the file has an extra empty unterminated line.

https://github.com/llvm/llvm-project/pull/120605


More information about the llvm-commits mailing list