[PATCH] D143142: [clang][lex] Enable Lexer to grow its buffer
David Rector via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Feb 20 06:20:23 PST 2023
davrec added a comment.
Suggested a few adjustments in `LexTokenInternal` you might want to test for perf improvements (orthogonal to this patch, but could boost its numbers :).
And also noted that `Lexer::getBuffer()` has same issue as `getBufferLocation()` re potential for pointer invalidation IIUC. At a minimum we need comments on these functions explaining any risks; better still to remove them from the public interface. If downstream users use these functions and complain, good - they need to be aware of this change.
================
Comment at: clang/include/clang/Lex/Lexer.h:285
/// Gets source code buffer.
- StringRef getBuffer() const {
- return StringRef(BufferStart, BufferEnd - BufferStart);
- }
+ StringRef getBuffer() const { return StringRef(BufferStart, BufferSize); }
----------------
Same issue as with `getBufferLocation()`, publicly returning it permits possible pointer invalidation. Fortunately I only see it used in a single spot (prior to this patch anyway) which can be easily eliminated IIUC. Yank this function? Or make private/append "Unsafe" to name (and explain in comments)?
================
Comment at: clang/include/clang/Lex/Lexer.h:307
/// Return the current location in the buffer.
- const char *getBufferLocation() const { return BufferPtr; }
+ const char *getBufferLocation() const {
+ assert(BufferOffset <= BufferSize && "Invalid buffer state");
----------------
davrec wrote:
> I think I'd like this to return `unsigned`; i.e. I think after this patch we should not even be publicly exposing buffer locations as pointers, IIUC. A brief search for uses of `getBufferLocation()` (there aren't many) suggests this would be probably be fine and indeed would get rid of some unnecessary pointer arithmetic. And indeed if anything really needs that `const char *` that might be a red flag to investigate further.
Looking more closely I see that `getCurrentBufferOffset` returns the unsigned, and this patch already changes some `getBufferLocation` usages to `getCurrentBufferOffset`. So, I say either yank it or make it private or append "Unsafe" and explain in comments.
================
Comment at: clang/lib/Lex/Lexer.cpp:2948-2949
+ if (isHorizontalWhitespace(BufferStart[CurOffset])) {
+ SkipWhitespace(Result, CurOffset + 1, TokAtPhysicalStartOfLine);
+ return false;
}
----------------
indent
================
Comment at: clang/lib/Lex/Lexer.cpp:2973-2975
+ char Char = getAndAdvanceChar(CurOffset, Tmp);
+ switch (Char) {
+ default:
----------------
indent
================
Comment at: clang/lib/Lex/Lexer.cpp:3630-3632
do {
- ++CurPtr;
- } while (isHorizontalWhitespace(*CurPtr));
+ ++CurOffset;
+ } while (isHorizontalWhitespace(BufferStart[CurOffset]));
----------------
```
for (isHorizontalWhitespace(BufferStart[++CurOffset]);;)
;
```
might save a few instructions? Worth trying since this function is the main perf-critical one.
================
Comment at: clang/lib/Lex/Lexer.cpp:3739-3752
+ if (BufferStart[CurOffset] == '/' && BufferStart[CurOffset + 1] == '/' &&
+ !inKeepCommentMode() && LineComment &&
+ (LangOpts.CPlusPlus || !LangOpts.TraditionalCPP)) {
+ if (SkipLineComment(Result, CurOffset + 2, TokAtPhysicalStartOfLine))
return true; // There is a token to return.
goto SkipIgnoredUnits;
+ } else if (BufferStart[CurOffset] == '/' &&
----------------
Spitballing again for possible minor perf improvements:
```
if (char Char0 = BufferStart[CurOffset] == '/' && !inKeepCommentMode()) {
if (char Char1 = BufferStart[CurOffset + 1] == '/' && LineComment &&
(LangOpts.CPlusPlus || !LangOpts.TraditionalCPP)) {
if (SkipLineComment(Result, CurOffset + 2, TokAtPhysicalStartOfLine))
return true; // There is a token to return.
goto SkipIgnoredUnits;
} else if (Char1 == '*') {
if (SkipBlockComment(Result, CurOffset + 2, TokAtPhysicalStartOfLine))
return true; // There is a token to return.
goto SkipIgnoredUnits;
}
} else if (isHorizontalWhitespace(Char0)) {
goto SkipHorizontalWhitespace;
}
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D143142/new/
https://reviews.llvm.org/D143142
More information about the cfe-commits
mailing list