[clang] [clang][Diagnostics] Highlight code snippets (PR #66514)
Richard Smith via cfe-commits
cfe-commits at lists.llvm.org
Wed Sep 20 10:48:31 PDT 2023
Timm =?utf-8?q?Bäder?= <tbaeder at redhat.com>,
Timm =?utf-8?q?Bäder?= <tbaeder at redhat.com>
Message-ID:
In-Reply-To: <llvm/llvm-project/pull/66514/clang at github.com>
================
@@ -0,0 +1,77 @@
+
+#include "clang/Frontend/CodeSnippetHighlighter.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/Preprocessor.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "llvm/Support/raw_ostream.h"
+
+using namespace clang;
+
+static SourceManager createTempSourceManager() {
+ FileSystemOptions FileOpts;
+ FileManager FileMgr(FileOpts);
+ llvm::IntrusiveRefCntPtr<DiagnosticIDs> DiagIDs(new DiagnosticIDs());
+ llvm::IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts(new DiagnosticOptions());
+ DiagnosticsEngine diags(DiagIDs, DiagOpts);
+ return SourceManager(diags, FileMgr);
+}
+
+static Lexer createTempLexer(llvm::MemoryBufferRef B, SourceManager &FakeSM,
+ const LangOptions &LangOpts) {
+ return Lexer(FakeSM.createFileID(B), B, FakeSM, LangOpts);
+}
+
+std::vector<StyleRange> CodeSnippetHighlighter::highlightLine(
+ StringRef SourceLine, const Preprocessor *PP, const LangOptions &LangOpts) {
+ if (!PP)
+ return {};
+ constexpr raw_ostream::Colors CommentColor = raw_ostream::BLACK;
+ constexpr raw_ostream::Colors LiteralColor = raw_ostream::GREEN;
+ constexpr raw_ostream::Colors KeywordColor = raw_ostream::YELLOW;
+
+ SourceManager FakeSM = createTempSourceManager();
+ const auto MemBuf = llvm::MemoryBuffer::getMemBuffer(SourceLine);
+ Lexer L = createTempLexer(MemBuf->getMemBufferRef(), FakeSM, LangOpts);
+ L.SetKeepWhitespaceMode(true);
----------------
zygoloid wrote:
While I think re-lexing the input to find the tokens is the right approach, starting with the source line in isolation is going to do the wrong thing in a lot of cases. For example, a format string warning inside a multi-line raw string literal will get bad highlighting due to not taking the initial lexing state for the line into account. But equally, re-lexing the entire file seems like it's going to be problematic from a performance perspective. I can think of a few alternatives here:
1) We could make the regular lexing process keep track of some of the lines where the lexer is in its "normal" state at the start of the line -- whenever we're in the normal lexing state at the start of a line, add the line number to a per-file list if it's been "long enough" (maybe >1K of program text?) since we last did so. Then when emitting diagnostics, we can find the most recent line where we were at a good state at the start of the line, and lex forward from there to drive syntax highlighting.
2) We could make the diagnostics layer keep a cache of the tokenized forms of buffers for which we emit diagnostics. We'd still re-lex an entire file if we emit diagnostics within it, but we'd only do so *once*, and we don't need to store the full list of tokens, only a list of (offset, color) pairs for transitions between token kinds.
Thoughts?
https://github.com/llvm/llvm-project/pull/66514
More information about the cfe-commits
mailing list