[cfe-dev] [RFC] A C++ pseudo parser for tooling

Andrew Tomazos via cfe-dev cfe-dev at lists.llvm.org
Fri Nov 5 17:59:52 PDT 2021


Unfortunately it's not possible to parse C++ even close to accurately
without preprocessing (and so build-system integration).  There are
predefined macros that determine what code is conditionally included,
conditionally included code can change basically anything, redefine
anything.  Macros can expand to an arbitrary token sequence (or even create
new tokens through stringization or concatenation).  It means that any
identifier can become any token sequence.  That's even before we mention
how name lookup is needed for disambiguation.  To parse C++ you in fact
need to do full preprocessing and a large chunk of semantic analysis.

Given how inaccurate the parse from the best possible "single source file"
parser is - it's not clear what the use case is for it.  clang-format
(largely) only makes whitespace changes, so there is limited opportunity
for inaccuracies in its parse to lead to errors.

To generate file outlines and do refactoring I suspect you're better off
waiting for a proper parse than using a completely inaccurate one.  In the
dev environment I use, past versions of the indexer had tried to do such an
approximate parse, and current versions do a full correct C++ parse, so
I've experienced the difference first-hand.  It's night and day.

Just my 2c.  -Andrew

On Fri, Nov 5, 2021 at 1:37 PM Haojian Wu via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hello everyone,
>
> We’d like to propose a pseudo-parser which can approximately parse C++
> (including broken code). It parses a file in isolation, without needing
> headers, compile flags etc. Ambiguities are resolved heuristically, like
> clang-format. Its output is a clang::syntax tree, which maps the token
> sequence onto the C++ grammar.
> Our motivation comes from wanting to add some low latency features (file
> outline, refactorings etc) in clangd, but we think this is a useful
> building block for other tools too.
>
> Design is discussed in detail here:
> https://docs.google.com/document/d/1eGkTOsFja63wsv8v0vd5JdoTonj-NlN3ujGF0T7xDbM/edit?usp=sharing
>
> The proposal is based on the experience with a working prototype.
> Initially, we will focus on building the foundation. We consider the first
> version as experimental, and plan to use and validate it with applications
> in clangd (the detailed plan is described here
> <https://docs.google.com/document/d/1eGkTOsFja63wsv8v0vd5JdoTonj-NlN3ujGF0T7xDbM/edit#heading=h.mawgmexy688j>
> ).
>
> As soon as we have consensus on the proposal, we plan to start this work
> in the clang repository (code would be under clang/Tooling/Syntax). We hope
> we can start sending out patches for review at the end of November.
>
> Eager to hear your thoughts. Comments and suggestions are much appreciated.
>
> Thanks,
> Haojian
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211106/f962e28c/attachment.html>


More information about the cfe-dev mailing list