[PATCH] D143142: [clang][lex] Change Lexer to use offsets instead of direct pointer

Sunho Kim via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Feb 6 03:05:54 PST 2023


sunho created this revision.
Herald added a project: All.
sunho updated this revision to Diff 494187.
sunho added a comment.
sunho retitled this revision from "Format" to "[clang] Change Lexer to use offsets instead of direct pointer".
sunho edited the summary of this revision.
sunho retitled this revision from "[clang] Change Lexer to use offsets instead of direct pointer" to "[clang][lex] Change Lexer to use offsets instead of direct pointer".
sunho updated this revision to Diff 494968.
sunho added a comment.
sunho updated this revision to Diff 494976.
sunho added a comment.
sunho updated this revision to Diff 495035.
sunho updated this revision to Diff 495038.
sunho updated this revision to Diff 495040.
sunho edited the summary of this revision.
sunho edited the summary of this revision.
sunho edited the summary of this revision.
sunho edited the summary of this revision.
sunho published this revision for review.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Update


sunho added a comment.

Update


sunho added a comment.

Update


sunho added a comment.

Update


Change Lexer to use offsets instead of direct pointers to buffer so that even if we swap the buffer address in the middle, Lexer will be still functional.

Since clang-repl receive source code incrementally line by line, source code buffer isn't really fixed but "growing" as the user feeds more inputs to the buffer. When we grow the buffer, practically the buffer address can change. However, since Lexer is using direct pointer to some point in buffer, once buffer is swapped every pointer needs to be updated including all trivial local variables -- which is very challenging to do without sacrificing robustness.

This change solves this issue nicely. Since we will be only adding code at the back of the buffer, the offsets are always constant even if we grow the buffer many times. We do add a number of indirections to BufferStart, but performance impact on actual compile time turned out to be negligible. We do have around 0.5%~0.7% increase in instruction count, though.

NOTE: This is part 1 of https://discourse.llvm.org/t/rfc-flexible-lexer-buffering-for-handling-incomplete-input-in-interactive-c-c/64180


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D143142

Files:
  clang/include/clang/Lex/Lexer.h
  clang/include/clang/Lex/Preprocessor.h
  clang/lib/Format/FormatTokenLexer.cpp
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/PPDirectives.cpp
  clang/lib/Lex/PPLexerChange.cpp
  clang/lib/Lex/Pragma.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D143142.495040.patch
Type: text/x-patch
Size: 148743 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20230206/b1ce3f2f/attachment-0001.bin>


More information about the cfe-commits mailing list