[llvm] [SourceMgr] Clean up handling of line ending characters (PR #120605)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 7 04:09:58 PST 2025
================
@@ -91,19 +91,32 @@ static std::vector<T> &GetOrCreateOffsetCache(void *&OffsetCache,
size_t Sz = Buffer->getBufferSize();
assert(Sz <= std::numeric_limits<T>::max());
StringRef S = Buffer->getBuffer();
- for (size_t N = 0; N < Sz; ++N) {
- if (S[N] == '\n')
- Offsets->push_back(static_cast<T>(N));
+
+ // The cache always includes 0 (for the start of the first line) and Sz (so
+ // that you can always index by N+1 to find the end of line N, even if the
+ // last line has no terminating newline).
+ Offsets->push_back(0);
+ for (size_t N = 0; N != Sz;) {
+ while (N != Sz && S[N] != '\n' && S[N] != '\r')
+ ++N;
+ if (N == Sz)
+ break;
+
+ // Skip over CR, LF, CRLF or LFCR.
+ ++N;
+ if (N != Sz && (S[N - 1] ^ S[N]) == ('\r' ^ '\n'))
+ ++N;
+ Offsets->push_back(static_cast<T>(N));
}
+ Offsets->push_back(static_cast<T>(Sz));
----------------
jayfoad wrote:
Both LLVM and MLIR have unit tests to check that if your file ends with LF, and you look up the line/column number for a pointer to the end of the buffer (just after the LF), then it is treated like the file has an extra empty unterminated line.
https://github.com/llvm/llvm-project/pull/120605
More information about the llvm-commits
mailing list